UW-Madison CS 736 - Pachyderm - The Web Proxy that Never Forgets - D3050138

Home> Schools> University of Wisconsin, Madison> (CS) > CS 736> Pachyderm - The Web Proxy that Never Forgets

DOC PREVIEW

UW-Madison CS 736 - Pachyderm - The Web Proxy that Never Forgets

School name University of Wisconsin, Madison

Course Cs 736- Advanced Operating Systems

Pages 14

This preview shows page 1-2-3-4-5 out of 14 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Pachyderm: The Web Proxy that Never Forgets.University of WisconsinMadison, WI 53705December 18, 2000Abstract2. The Pachyderm Web ProxyCommunication among computers in the Internet is achieved through the use of HTTP messages. There are two types of messages in HTTP/1.1 and those are request messages and response messages - figure 2.1.Figure 2.1: Downloading an Internet DocumentClients inform web servers of their request for a particular Internet document by issuing a request message. A request message consists of a single line that contains information about the method to be applied to the source, the source requested, and the HTTP version being used followed by zero or more header fields that further describe the request. There are six methods that can be specified in the request message: get, put, post, head, delete, and trace. The Pachyderm web proxy only supports the get method and is described below.The get method is a very important method from the perspective of an Internet web proxy. Not only can a get method be used to request a desired document from the Internet, but the get method can be changed to a conditional get method with the addition of special header fields. When a conditional get method is received by a web server that supports this method, the web server will only send the requested document back provided the document has been changed since the last time it was downloaded by the requestor. Otherwise, if the document has not been updated, then only a response header stating that no modifications have been made is sent back. The purpose of the conditional get method is to improve the performance of Internet document caches. If the document has not changed, only a short message that indicates this fact will be sent back to the requestor instead of the entire document. As a result, a significant amount of time can be saved since a document can be retrieved from a local cache and returned to the client much faster than it could be read across a network.In a case where the web server sends back a response header indicating that the document has not been modified, the client connection gets the contents of the document from the cache file indicated by the hash table lookup. The contents will be sent along with a response header back to the client. However, if the contents of the document was returned by the web server, then the contents would be sent to the client and stored in a unique cache file within the cache directory that stores all of the former versions of this document.If an entry does not exist for the requested document in the hash table, then there is no corresponding file in the cache. An entry will not be found in the hash table the first time a user requests any new document. In this case, the get request received from the client is forwarded to the external web server. The document contents that are received from the server will be placed in a file within a new directory created for this document. An entry will be placed in the hash table so that this document can be used to fulfill later client requests.Figure 2.2: Pachyderm StructureOnce the words within the file blocks are placed in the index, searches can be performed. GLIMPSE uses the search words that are provided by the user to sequentially search the index for matching words. When a matching word is found, the block of text containing that word can be accessed through the stored pointer. At this point, another sequential search is done on that block of text to exactly pinpoint where the desired word occurs. The phrase that the word occurs in can then be returned to the user along with the name of the file in which that phrase was found.This tool along with an extension called WebGLIMPSE allows the user to be able to search an index built on the Internet documents stored in Pachyderm’s cache via an HTML form on a web page. This application enables the user to have a simple and fast way to view Internet documents that they have viewed in the past.3 Performance and Evaluation3.1 Caching PoliciesIn order to explore the advantages of caching along with the overhead incurred by storing all of the web documents that have been viewed by a user, two web proxies that use different caching polices were implemented. The first web proxy used a traditional keep one cache policy. In this version, only one copy of each requested Internet document would be stored in the cache. If that document were subsequently updated, then the new copy of the document would replace the old document. The second web proxy that was implemented is the Pachyderm web proxy that uses a keep all caching policy. In Pachyderm, not only is the current copy of an Internet document stored in the cache, but in addition, historically viewed Internet documents are also stored in the cache. Therefore, every cacheable web document that has ever been viewed by a client is stored within Pachyderm’s cache.3.2 Workload3.2 Average Response Time3.2 Dependence of Average Response Time on Number of Documents in Cache3.3 GLIMPSE Indexing OverheadIn order to be able to search Pachyderm’s cache of web pages with GLIMPSE, first an index has to be built on the cache. From figure 3.4, it can be seen that there is an overhead associated with indexing these documents. In order to index a 1.5 MB cache it took approximately 1.25 seconds. The amount of time to index the cache will increase as the cache size increases. However, for this application it is argued that a few seconds or even a minute will be accepted by the user in order to have the ability to find information that may no longer exist on the Internet. During normal Internet use, a user may have to wait several minutes due to network traffic or busy web servers for service; therefore, users have demonstrated the willingness to wait for information of interest from the Internet.Fig. 3.4 GLIMPSE Indexing Time4. Related WorkCurrently, Pachyderm is only able to query its own local cache. Therefore, it has no need for a facility to communicate with other caches; however, in the future Pachyderm might be extended to deal with hierarchical caches.5. Future Work6. Conclusions2. Edith Cohen, Balachander Krishnamurthy, and Park Avenue. ImporvingPachyderm: The Web Proxy that Never Forgets.Alison Krautkramer Jing Li Remzi [email protected] [email protected] [email protected] Sciences DepartmentUniversity of Wisconsin1210 West Dayton StreetMadison, WI 53705December 18,

View Full Document