Proxy Lab Recitation I Monday Nov 20 2006 Outline What is a HTTP proxy HTTP Tutorial HTTP Request HTTP Response Sequential vs concurrent proxies Caching What is a proxy Client Browser Proxy Server www google com Why a proxy Access control allowed websites Filtering viruses for example Caching multiple people request CNN Brief HTTP Tutorial Hyper Text Transfer Protocol Protocol spoken between a browser and a web server From browser web server REQUEST GET http www google com HTTP 1 0 From web server browser RESPONSE HTTP 200 OK Other stuff HTTP Request Request Type Host Path Version GET http csapp cs cmu edu simple html HTTP 1 1 Host csapp cs cmu edu User Agent Mozilla 5 0 Accept text xml application xml Accept Language en us en q 0 5 Accept Encoding gzip deflate An empty line terminates a HTTP request HTTP Request GET http csapp cs cmu edu simple html HTTP 1 1 Host csapp cs cmu edu User Agent Mozilla 5 0 Accept text xml application xml Accept Language en us en q 0 5 Accept Encoding gzip deflate The Host header is optional in HTTP 1 0 but we recommend that it be always included HTTP Request GET http csapp cs cmu edu simple html HTTP 1 1 Host csapp cs cmu edu User Agent Mozilla 5 0 Accept text xml application xml Accept Language en us en q 0 5 Accept Encoding gzip deflate The User agent identifies the browser type Some websites use it to determine what to send And reject you if you say you use MyWeirdBrowser Proxy must send this and all other headers through HTTP Response Status HTTP 1 1 200 OK Date Mon 20 Nov 2006 03 34 17 GMT Server Apache 1 3 19 Unix Last Modified Mon 28 Nov 2005 23 31 35 GMT Content Length 129 Connection Keep Alive Content Type text html Status indicates whether it was successful or not if it is a redirect etc The complete response should be transparently sent back to the client by the proxy HTTP Response HTTP 1 1 200 OK Date Mon 20 Nov 2006 03 34 17 GMT Server Apache 1 3 19 Unix Last Modified Mon 28 Nov 2005 23 31 35 GMT Content Length 129 Connection Keep Alive Content Type text html This field identifies how many bytes are there in the response Not sent by all web servers DO NOT RELY ON IT Concurrent Proxy Need to handle multiple requests simultaneously From different clients From the same client E g each individual image in a HTML document needs to be requested separately Serving requests sequentially decreases throughput Server is waiting for I O most of the time This time can be used to start serving other clients Multiple outstanding requests Concurrent Proxy Use threads for making proxy concurrent Create one thread for each new client request The thread finishes and exists after serving the client request Use pthread library pthread create pthread detach etc Can use select as well for adding concurrency Much more difficult to get right Caching Proxy Most geeks visit http slashdot org every 2 minutes Why fetch the same content again and again If it doesn t change frequently The proxy can cache responses Serve directly out of its cache Reduces latency network load Caching Implementation Issues Use the GET URL host path to locate the appropriate cache entry THREAD SAFETY A single cache is accessed by multiple threads Easy to create bugs thread 1 is reading an entry while thread 2 is deleting the same entry General advice Use RIO routines rio readnb rio readlineb Be very careful when you are reading line by line HTTP request versus just a stream of bytes HTTP response When to use strcpy vs memcpy gethostbyname inet ntoa are not threadsafe Path sequential concurrency caching
View Full Document