CS 213, Fall 2010Lab Assignment L7: Writing a Caching Web ProxyAssigned: Tue, Nov 09, Due: Tue, Nov 23, 11:59 PMLast Possible Time to Turn in: Fri, Nov 26, 11:59 PMTheodore Martin ([email protected]) is the lead TA for this lab.1 IntroductionA web proxy is a program that acts as a middleman between a web server and browser. Instead of contactingthe server directly to get a web page, the browser contacts the proxy, which forwards the request on to theserver. When the server replies to the proxy, the proxy sends the reply on to the browser.Proxies are used for many purposes. Sometimes proxies are used in firewalls, such that the proxy is theonly way for a browser inside the firewall to contact a server outside. The proxy may do translation on thepage, for instance, to make it viewable on a web-enabled cell phone. Proxies are used as anonymizers – bystripping a request of all identifying information, a proxy can make the browser anonymous to the server.Proxies can even be used to cache web objects, by storing a copy of, say, an image when a request for it isfirst made, and then serving that image directly in response to future requests rather than going to the server.In this lab, you will write a simple proxy that caches web objects. In the first part of the lab, you will setup the proxy to accept a request, forward the request to the server, and return the result back to the browser.In this part, you will learn how to write programs that interact with each other over a network (socketprogramming), as well as some basic HTTP. In the second part, you will upgrade your proxy to deal withmultiple open connections at once. Your proxy should spawn a separate thread to deal with each request.This will give you an introduction to dealing with concurrency, a crucial systems concept. Finally, you willturn your proxy into a proxy cache by adding a simple main memory cache of recently accessed web pages.2 LogisticsUnlike previous labs, you can work individually or in a group of two on this assignment. The lab is designedto be doable by a single person, so there is no penalty for working alone. You are, however, welcome toteam up with another student if you wish.1We will not be releasing an autograder for this lab, nor will autolab run the autograder for you. The autogradewill be determined by a tool called dbug which we will explain in a later section. The majority of your gradewill be determined by giving a demo of your proxy to a member of the course staff in the days followingthe due date for this lab. Every student is required to attend an interview with a TA, groups should attendan interview together as a group. You will not receive a grade on this assignment unless you signup for, and attend an interview with a member of the course-staff. A link to demo sign-ups will beposted on the course web page soon. All clarifications and revisions to the assignment will be posted to theAutolab message board. Partner signups will be through autolab. You will receive directions for signing upin recitation.DBUG Jiri Simsa has created a tool for checking concurrent code for race conditions. He has adapted thistool to create a grade for your proxy for eliminating race conditions. 30% of your grade on this lab willbe based on this tool. He will be hosting an explanation of DBUG on November 20th as well as makinga virtual image available for your use in debugging your proxy. This is tentatively scheduled for 3PM inMcConomy Auditorium.Grace Days: You may use this function to calculate the number of late days you may use on this lab.min(1, your late days remaining, your partner’s late days remaining)3 Hand Out InstructionsStart by downloading proxylab-handout.tar from Autolab to a protected directory in which youplan to do your work. Then give the command tar xvf proxylab-handout.tar. This will causea number of files to be unpacked in the directory. The three files you will be modifying and turning in areproxy.c, csapp.c, and csapp.h. You may add any files you wish to this directory as you will besubmitting the entire directoryNOTE:Transfer the tarball to a shark machine before unpacking it. Some operating systems and file transferclients wipe out Unix file permission bits.The proxy.c file should eventually contain the bulk of the logic for your proxy.The csapp.c and csapp.h files are described in your textbook. The csapp.c file contains error han-dling wrappers and helper functions such as the RIO functions (Section 11.4), the openclientfd func-tion (Section 12.4.4), and the openlistenfd function (Section 12.4.7).4 Part I: Implementing a Sequential Web ProxyThe first step is implementing a basic sequential proxy that handles requests one at a time. When started,your proxy should open a socket and listen for connection requests on the port number that is passed in onthe command line. (See the section “Port Numbers” below.)When the proxy receives a connection request from a client (typically a web browser), the proxy shouldaccept the connection, read the request, verify that it is a valid HTTP request, and parse it to determine theserver that the request was meant for. It should then open a connection to that server, send it the request,receive the reply, and forward the reply to the browser.2Notice that, since your proxy is a middleman between client and server, it will have elements of both. It willact as a server to the web browser, and as a client to the web server. Thus you will get experience with bothclient and server programming.Processing HTTP RequestsWhen an end user enters a URL such as http://www.yahoo.com/news.html into the address barof the browser, the browser sends an HTTP request to the proxy that begins with a line looking somethinglike this:GET http://www.yahoo.com/news.html HTTP/1.0In this case the proxy will parse the request, open a connection to www.yahoo.com, and then send anHTTP request starting with a line of the form:GET /news.html HTTP/1.0to the server www.yahoo.com. Please note that all lines end with a carriage return ’\r’ followed by aline feed ’\n’, and that HTTP request headers are terminated with an empty line. Since a port number wasnot specified in the browser’s request, in this example the proxy connects to the default HTTP port (port 80)on the server. The web browser may specify a port that the web server is listening on, if it is different from thedefault of 80. This is encoded in a URL as follows: http://www.example.com:8080/index.htmlThe
View Full Document