15-213 Recitation: Proxy Lab Elie Krevat (with wisdom from past terms)Outline Intro to Proxy Lab This week: Sequential Proxy HTTP over TCP/IP What to parse from HTTP headers What headers to suppress Handling broken pipes and using RIO Testing and Debugging Next week: Concurrency and cachingWhat is a proxy Middle-man between browser (client) and web server Acts as client to web server Also acts as server to browser Useful as firewall, logger, cacheProxy Lab: What we give you Tiny Web Server Example of web server code Debug: Change code to control behavior csapp.c/h RIO package, wrapper/helper functions port_for_user.pl Script to generate port # for your proxy proxy.c – Empty!Proxy Lab: What you’ll do Part 1: Sequential Web Proxy Accept conn, read req, parse it, forward req to server, get reply, forward to client Part 2: Thread-based Concurrent Proxy Spawn threads for each request in parallel Part 3: Adding a Cache Apply LRU eviction policy Make cache efficient and thread-safe!Start early (please!) Proxy Lab is less intricate than Malloc BUT you’ll be writing a full proxy, with code basically from scratch… …and you’ll still need to understand some conceptual hurdles… …AND IT TAKES ABOUT AS LONG, IF NOT LONGER, THAN MALLOC LAB!! So please start early!What Proxy Lab covers Software engineering skills Writing projects from scratch, in groups Reading formal specifications Testing and extending functionality Unix socket programming Internet communication Threading and concurrency Caching and data structure designSocket programming (briefly) Socket is a file descriptor, special init Identifies endpoint of communication Imp. functions: connect, bind, accept Sockets opened with sockaddr For Internet, use sockaddr_inHTTP over TCP/IP Everything is handled in layers IP handles addressing, unreliable comm. Which computer, basic message passing TCP over IP handles multiplexing and reliable comm. Which process, moving bytes in-order through congestion and packet loss HTTP over TCP handles semantics Bytes become ordered text and pictures on pageHTTP Request HTTP defines protocol between web servers and clients Headers hold meta-data of connection Type of request (GET, PUT, POST…) Proxy Lab only covers GET requests! URL destination (http://www.cmu.edu) Body holds actual dataHTTP Request (cont.) GET http://csapp.cs.cmu.edu/simple.html HTTP/1.1 Host: csapp.cs.cmu.edu User-Agent: Mozilla/5.0 ... Accept: text/xml,application/xml ... Accept-Language: en-us,en;q=0.5 ... Accept-Encoding: gzip,deflate ... Request Type Path Host Version An empty line (“\r\n”) terminates a request.Parsing the headers Complete URL Extract the path URI for HTTP request Version Change to HTTP 1.0 for server request Hostname Needed for the Host: field in server request Port Proxy needs dest. port of server (default 80) GET http://www.cmu.edu:80/index.html HTTP/1.1 <other information>Forwarding requests Web!Browser!process!Web!Server!process!Web!proxy!process!GET http://www.cmu.edu:80/index.html HTTP/1.1 <other information> - Connects to target web server, sends request: GET /index.html HTTP/1.0 <other information in the original request> - Proxy parses HTTP request - Port not always specified (default 80) - Proxy suppresses/modifies headers for server reqHeaders to suppress Connection/Proxy-Connection Change the field value to close Keep-Alive Remove this, don’t want persistent connections with HTTP/1.0 Keep the rest!HTTP Response HTTP/1.1 200 OK Date: Mon, 20 Nov 2006 03:34:17 GMT Server: Apache/1.3.19 (Unix) … Last-Modified: Mon, 28 Nov 2005 23:31:35 GMT Content-Length: 129 Connection: Keep-Alive Content-Type: text/html Status Status indicates success Send complete response back to clientBroken pipe errors Occurs when writing to socket and connection closed prematurely at other end E.g., click “stop” on browser Kernel returns normally on first write But on subsequent writes, kernel sends SIGPIPE Terminates process by default (can be blocked or caught) Returns -1 with errno set to EPIPE When reading from socket with closed connection Returns -1 with errno set to ECONNRESETUsing Robust I/O (RIO) Avoid upper-case wrapper functions (terminate all) Instead, close the offending connection Optionally, print error message Handle client request: Use rio_readlineb to read client req “\r\n” signals end of the request rio_writen to send request to server Handle server response: Use rio_readnb to read server response Binary data, so difference is memcpy vs. strcpy rio_writen to send response to client/browserDebug: Is this my problem? Web server issued HTTP redirect to the client! DNS lookup for requested hostname failed! Hostname ok but rest of URL bogus, server ret. 404! Web server crashed while it was replying! Server sent me mp3 but firefox won’t play it! Client crashed while I was sending server’s reply! Webpage contains images that I haven’t requested! Server sent me something too big for me to cache! Client is sending lots of indecipherable headers!Debug: Is this my problem? Web server issued HTTP redirect to the client! DNS lookup for requested hostname failed! Hostname ok but rest of URL bogus, server ret. 404! Web server crashed while it was replying! Server sent me mp3 but firefox won’t play it! Client crashed while I was sending server’s reply! Webpage contains images that I haven’t requested! Server sent me something too big for me to cache! Client is sending lots of indecipherable headers! Many things can go wrong, but most are out of scope!Testing Your Proxy Try a variety of web pages Test for both static & dynamic content Test binary files (e.g., images) See proxylab writeup for more tipsNext Week: Concurrency Shell lab handled asynchronous signals Proxy lab enables concurrent
View Full Document