Unformatted text preview:

Pachyderm The Web Proxy that Never Forgets Alison Krautkramer Jing Li sisko1 cs wisc edu jing cs wisc edu Remzi Arpaci Dusseau remzi cs wisc edu Computer Sciences Department University of Wisconsin 1210 West Dayton Street Madison WI 53705 December 18 2000 ABSTRACT As a result of the increasing popularity and resulting growth of the Internet fast efficient and reliable access to the information on the Internet is becoming increasingly important In this paper an introduction to the Pachyderm Web Proxy is presented This web proxy is an Internet proxy that not only caches the most recent copy of requested Internet documents but it also stores old versions of those documents for future reference Flexible context searches can be performed on the data within the cache to assist the user with identifying web pages of interest that have been viewed historically This paper examines the design and implementation of the Pachyderm Web Proxy and provides measurements indicating that caching is an efficient method of satisfying client requests Also it will be demonstrated that indexing and searching historically viewed web documents can be done without the commitment of large amounts of memory or time 1 Introduction In recent years the Internet has exploded in popularity In the past the Internet was used almost exclusively for disseminating information in an educational or government setting but now the Internet is becoming a standard in business Finding that the Internet can lead to an even wider customer base many businesses are investing significant resources in providing web pages that not only inform users but also sell goods and services The explosion of activity on the Internet has caused it to be a dynamic environment A page that is viewed one day may be completely changed or deleted the next In order to increase 1 the reliability speed and usability of the Internet we have devised a way to store historical copies of viewed web pages in a web proxy cache A web proxy is a server that acts as a middleman between a client and the Internet figure 1 1 All requests from the client are sent to the proxy server instead of directly to the web server that serves the requested document The web proxy uses the request to decide how to service the client in the fastest and most efficient way First the proxy will look for the requested document in a cache it uses to store copies of previously downloaded documents If this document is found and it has not been changed since it was last downloaded it is returned to back the user However if the document was not found in cache or it has been updated since the last time it was downloaded then the proxy will forward the request to the appropriate web server for appropriate action Upon receipt of the request the external web server will fetch the correct document and send it back to the proxy In turn the document is summarily returned to the client If the document can be cached it will be stored in the proxy s cache in preparation for future client requests Future requests for this document can then be served by returning the cached copy of the desired document instead of requiring contact from the external server and forcing the document to be downloaded again Most web proxy caches 1 4 only store the most recent copies of viewed web documents As the user views new pages the cache uses a page replacement policy to find an old web page in the cache to replace with the new information Client Proxy Web Server Cache Figure 1 1 Web Proxy Server In Pachyderm all web pages that the user has viewed are stored in a cache As a result if a web page that the user is interested in is either removed from the Internet or the information it contained is changed the user can still view the old copy of the page This paper will examine the design implementation and evaluation of the Pachyderm Internet web proxy In Pachyderm all copies of viewed web documents are stored in a cache The cache can then be used not only to satisfy user requests in an efficient manner but also serves 2 to allow users the additional capability to view the contents of historical Internet documents even though those documents no longer exist 2 The Pachyderm Web Proxy This section describes the implementation portion of the Pachyderm web proxy 2 1 Communication HTTP Headers Communication among computers in the Internet is achieved through the use of HTTP messages There are two types of messages in HTTP 1 1 and those are request messages and response messages figure 2 1 Client Get file html HTTP 1 1 Web Server Status Code 304 Not Modified Figure 2 1 Downloading an Internet Document 3 Clients inform web servers of their request for a particular Internet document by issuing a request message A request message consists of a single line that contains information about the method to be applied to the source the source requested and the HTTP version being used followed by zero or more header fields that further describe the request There are six methods that can be specified in the request message get put post head delete and trace The Pachyderm web proxy only supports the get method and is described below The get method is a very important method from the perspective of an Internet web proxy Not only can a get method be used to request a desired document from the Internet but the get method can be changed to a conditional get method with the addition of special header fields When a conditional get method is received by a web server that supports this method the web server will only send the requested document back provided the document has been changed since the last time it was downloaded by the requestor Otherwise if the document has not been updated then only a response header stating that no modifications have been made is sent back The purpose of the conditional get method is to improve the performance of Internet document caches If the document has not changed only a short message that indicates this fact will be sent back to the requestor instead of the entire document As a result a significant amount of time can be saved since a document can be retrieved from a local cache and returned to the client much faster than it could be read across a network 2 2 Implementation In order to implement the Pachyderm web proxy first an open source Internet web proxy named RabbIT2 6 was downloaded from the Internet RabbIT2 served as the template on which Pachyderm is implemented Pachyderm was written and tested


View Full Document

UW-Madison CS 736 - Pachyderm - The Web Proxy that Never Forgets

Documents in this Course
Load more
Loading Unlocking...
Login

Join to view Pachyderm - The Web Proxy that Never Forgets and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pachyderm - The Web Proxy that Never Forgets and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?