15 213 Fall 2002 Lab Assignment L7 Logging Web Proxy Assigned November 21 Due December 5 11 59PM Rajesh is in charge of this lab Please email him rajesh cs cmu edu with any questions or concerns Introduction A web proxy is a program which acts as a middleman between a web server and browser Instead of contacting the server directly to get a web page the browser contacts the proxy which forwards the request on to the server When the server replies to the proxy the proxy sends the reply on to the browser Proxies are used for many purposes Sometimes proxies are used in firewalls such that the proxy is the only way for a browser inside the firewall to contact a server outside The proxy may do translation on the page for instance to make it viewable on a web enabled cell phone Proxies are used as anonymizers by stripping a request of all identifying information a proxy can make the browser anonymous to the server Proxies can even be used to cache web objects by storing a copy of say an image when a request for it is first made and then serving that image in response to future requests rather than going to the server Squid available at http squid nlanr net is a free proxy cache In this lab you will write a simple web proxy that logs and filters requests In the first part of the lab you will set up the proxy to accept requests check if the request should be blocked forward the requests to the server if not and return the result back to the browser keeping a log of such requests in a disk file In this part you will learn how to write programs that interact with each other over a network socket programming as well as some basic HTTP In the second part of the lab you will upgrade your proxy to deal with multiple open connections at once You may implement this in two ways Your proxy can spawn a separate thread to deal with each request or it may multiplex between the requests using the select 2 Unix system call Either option will give you an introduction to dealing with concurrency a crucial systems concept Using threads for this lab is recommended Logistics This is an individual assignment All files you need are in the directory afs cs cmu edu academic class 15213 f02 L7 Start by copying proxylab handout tar to a protected directory in which you plan to do your work Then give the command tar xvf proxylab handout tar This will cause a number of files to be 1 unpacked in the directory The files you will be modifying and turning in are proxy c csapp c and csapp h The proxy c file contains the bulk of the logic for your proxy The csapp c and csapp h files are described in your textbook The csapp c file contains error handling wrappers and helper functions such as the RIO functions Section 11 4 the open clientfd function Section 12 4 4 and the open listenfd function Section 12 4 7 The csapp h file contains the prototypes for the functions in csapp c Part I Implementing a web proxy The first step is implementing a basic logging proxy When started your proxy should open a socket and listen for connections When it gets a connection from a browser it should accept the connection read the request check if the address is blocked and parse it to determine the server that the request was meant for It should then open a socket connection to that server send it the request receive the reply and forward the reply to the browser if the request is not blocked Notice that since your proxy is a middleman between client and server it will have elements of both It will act as a server to the web browser and as a client to the web server Thus you will get experience with both client and server programming Filtering The blocked URLs are stored in the file called proxy filter The web proxy may read in the addresses at the initialization When a request on a blocked address comes from a web browser the proxy should return a permission denied information to the browser in the following format html head title Proxy error title head body You are not allowed to access this web page body html Logging Your proxy should also keep track of all requests in a log file named proxy log Each line should be of the form Date browserIP URL size where browserIP is the IP address of the browser URL is the URL asked for size is the size in bytes of the object that was returned For instance Sun 27 Oct 2002 02 51 02 EST 128 2 111 38 http www cs cmu edu 34314 Note that size is essentially the number of bytes received from the server from the time the connection is opened to the time it is closed Only requests that are met by a response from a server or cached response should be logged We have provided the function void format log entry to create a log entry in the required format 2 Graphics Display To see the number of connections currently being serviced by your proxy we have provided an xlib graphics display To see the display you need to do the following things On the machine you are logging in from type xhost fishmachine e g xhost bass cmcl cs cmu edu On the fish machine set the display to the machine you are logging in from using setenv DISPLAY ip address 0 0 e g setenv DISPLAY 128 2 64 32 0 0 This step may not be necessary if your telnet or ssh session automatically sets the DISPLAY environment variable for you The code for initializing and destroying the display have already been provided in the main routine of proxy c You will need to call change display with the current number of connections to change the display The display also logs the number of connections currently being serviced to a file called display log While building the concurrent server take care to ensure that all calls to change display are properly serialized Port Numbers Since there will be many people working on the same machine all of you can not use the same port to run your proxies You are allowed to select any non privileged port for your proxy as long as it is not taken by other system processes Selecting a port in the upper thousands is suggested i e 3070 or 8104 Part II Dealing with multiple requests Real proxies do not process requests sequentially They deal with multiple requests in parallel This is particularly important when handling a request can involve a lot of waiting as it can when you are for instance contacting a remote web server While your proxy is waiting for a remote server to respond to a request so that it can serve one browser it could be working on a pending request …
View Full Document