15-213, Fall 2002Lab Assignment L7: Logging Web ProxyAssigned: November 21, Due: December 5, 11:59PMRajesh is in charge of this lab. Please email him [email protected] with any questions or concerns.IntroductionA web proxy is a program which acts as a middleman between a web server and browser. Instead ofcontacting the server directly to get a web page, the browser contacts the proxy, which forwards the requeston to the server. When the server replies to the proxy, the proxy sends the reply on to the browser.Proxies are used for many purposes. Sometimes proxies are used in firewalls, such that the proxy is theonly way for a browser inside the firewall to contact a server outside. The proxy may do translation on thepage, for instance, to make it viewable on a web-enabled cell phone. Proxies are used as “anonymizers” –by stripping a request of all identifying information, a proxy can make the browser anonymous to the server.Proxies can even be used to cache web objects, by storing a copy of, say, an image when a request for it isfirst made, and then serving that image in response to future requests rather than going to the server. Squid(available at http://squid.nlanr.net) is a free proxy cache.In this lab, you will write a simple web proxy that logs and filters requests. In the first part of the lab, you willset up the proxy to accept requests, check if the request should be blocked, forward the requests to the serverif not, and return the result back to the browser, keeping a log of such requests in a disk file. In this part,you will learn how to write programs that interact with each other over a network (socket programming), aswell as some basic HTTP.In the second part of the lab, you will upgrade your proxy to deal with multiple open connections at once.You may implement this in two ways: Your proxy can spawn a separate thread to deal with each request,or it may multiplex between the requests using the select(2) Unix system call. Either option will giveyou an introduction to dealing with concurrency, a crucial systems concept. Using threads for this lab isrecommended.LogisticsThis is an individual assignment. All files you need are in the directory/afs/cs.cmu.edu/academic/class/15213-f02/L7Start by copying proxylab-handout.tarto a (protected) directory in which you plan to do your work.Then give the command tar xvf proxylab-handout.tar. This will cause a number of files to be1unpacked in the directory. The files you will be modifying and turning in are proxy.c, csapp.c, andcsapp.h.The proxy.c file contains the bulk of the logic for your proxy. The csapp.c and csapp.h filesare described in your textbook. The csapp.c file contains error handling wrappers and helper func-tions such as the RIO functions (Section 11.4), the open clientfd function (Section 12.4.4), and theopen listenfd function (Section 12.4.7). The csapp.h file contains the prototypes for the functionsin csapp.c.Part I: Implementing a web proxyThe first step is implementing a basic logging proxy. When started, your proxy should open a socket andlisten for connections. When it gets a connection (from a browser), it should accept the connection, read therequest, check if the address is blocked, and parse it to determine the server that the request was meant for.It should then open a socket connection to that server, send it the request, receive the reply, and forward thereply to the browser if the request is not blocked.Notice that, since your proxy is a middleman between client and server, it will have elements of both. It willact as a server to the web browser, and as a client to the web server. Thus you will get experience with bothclient and server programming.FilteringThe blocked URLs are stored in the file called proxy.filter. The web proxy may read in the addressesat the initialization. When a request on a blocked address comes from a web browser, the proxy shouldreturn a permission denied information to the browser in the following format:<html><head><title>Proxy error</title></head><body>You are not allowed to access this web page.</body></html>LoggingYour proxy should also keep track of all requests in a log file named proxy.log. Each line should be ofthe form:Date: browserIP URL sizewhere browserIP is the IP address of the browser, URL is the URL asked for, size is the size in bytesof the object that was returned. For instance:Sun 27 Oct 2002 02:51:02 EST: 128.2.111.38 http://www.cs.cmu.edu/ 34314Note that size is essentially the number of bytes received from the server, from the time the connection isopened to the time it is closed. Only requests that are met by a response from a server (or cached response)should be logged. We have provided the function void formatlog entry() to create a log entry inthe required format.2Graphics DisplayTo see the number of connections currently being serviced by your proxy, we have provided an xlib graphicsdisplay. To see the display, you need to do the following thingsOn the machine you are logging in from:type xhost +<fishmachine>e.g. xhost +bass.cmcl.cs.cmu.eduOn the fish machine, set the display to the machine you are logging in from usingsetenv DISPLAY <ip address>:0.0e.g. setenv DISPLAY 128.2.64.32:0.0This step may not be necessary if your telnet or ssh session automatically sets the DISPLAYenvironment variable for you.The code for initializing and destroying the display have already been provided in the main routine ofproxy.c. You will need to call change display with the current number of connections to changethe display. The display also logs the number of connections currently being serviced to a file called dis-play.log. While building the concurrent server, take care to ensure that all calls to change displayare properly serialized.Port NumbersSince there will be many people working on the same machine, all of you can not use the same port to runyour proxies. You are allowed to select any non-privileged port for your proxy, as long as it is not taken byother system processes. Selecting a port in the upper thousands is suggested (i.e., 3070 or 8104).Part II: Dealing with multiple requestsReal proxies do not process requests sequentially. They deal with multiple requests in parallel. This isparticularly important when handling a request can involve a lot of waiting (as it can when you are, forinstance, contacting a remote web server). While your proxy is waiting for a remote server to respond to arequest so that it can serve one browser, it
View Full Document