EE 122 The World Wide Web Ion Stoica TAs Junda Liu DK Moon David Zats http inst eecs berkeley edu ee122 Materials with thanks to Vern Paxson Jennifer Rexford and colleagues at UC Berkeley 1 Goals of Today s Lecture Main ingredients of the Web Key properties of HTTP Request response stateless and resource meta data Performance of HTTP URIs HTML HTTP Parallel connections persistent connections pipelining Web components Clients proxies and servers Caching vs replication 2 The Web History I 1945 Vannevar Bush Memex a device in which an individual stores all his books records and communications and which is mechanized so that it may be consulted with exceeding speed and flexibility Vannevar Bush 1890 1974 See http www iath virginia edu elab hfl0051 html 3 Memex The Web History II 1967 Ted Nelson Xanadu Ted Nelson A world wide publishing network that would allow information to be stored not as separate files but as connected literature Owners of documents would be automatically paid via electronic means for the virtual copying of their documents Coined the term Hypertext 4 The Web History III World Wide Web WWW a distributed database of pages linked through Hypertext Transport Protocol HTTP First HTTP implementation 1990 Tim Berners Lee HTTP 0 9 1991 Simple GET command for the Web HTTP 1 0 1992 Tim Berners Lee at CERN Client Server information simple caching HTTP 1 1 1996 5 Web Components Content Clients Send requests Receive responses Servers Objects Receive requests Send responses Store or generate the responses Proxies Placed between clients and servers Provide extra functions Act as a server for the client and a client to the server Caching anonymization logging transcoding filtering access Explicit or transparent interception 6 HTML A Web page has several components Base HTML file Referenced objects e g images HyperText Markup Language HTML Representation of hypertext documents in ASCII format Web browsers interpret HTML when rendering a page Several functions Content How Format text reference images embed hyperlinks HREF Straight forward to learn Syntax easy to understand Authoring programs can auto generate HTML Source almost always available 7 URI Content How Uniform Resource Identifier URI Uniform Resource Locator URL Provides a means to get the resource http www ietf org rfc rfc3986 txt Uniform Resource Name URN Names a resource independent of how to get it urn ietf rfc 3986 is a standard URN for RFC 3986 8 URL Syntax Content How protocol hostname port directorypath resource e g http inst eecs berkeley edu ee122 fa08 index html protocol hostname port http ftp https smtp rtsp etc Fully Qualified Domain Name FQDN IP address Defaults to protocol s standard port e g http 80 tcp https 443 tcp directory path resource Hierarchical often reflecting file system Identifies the desired resource Can also extend to program executions http us f413 mail yahoo com ym ShowLetter box 40B 40Bulk MsgId 2604 1744106 29699 1123 1261 0 28917 3552 1289957100 Search Nhead f YY 31454 order do wn sort date pos 0 view a head b 9 HTTP Client Server How HyperText Transfer Protocol HTTP Client server protocol for transferring resources Important properties Request response protocol Resource metadata Stateless ASCII format telnet www cs berkeley edu 80 GET istoica HTTP 1 0 blank line i e CRLF 10 HTTP Big Picture Server Client Request imag e1 age 1 im r e f s n a r T Request imag e2 age 2 Transfer im Request text Transfer text Finish display page 11 Client to Server Communication HTTP Request Message Request line method resource and protocol version Request headers provide information or modify request Body optional data e g to POST data to the server request line GET somedir page html HTTP 1 1 Host www someschool edu header User agent Mozilla 4 0 lines Connection close Accept language fr blank line Not optional carriage return line feed indicates end of message 12 Client to Server Communication HTTP Request Message Request line method resource and protocol version Request headers provide information or modify request Body optional data e g to POST data to the server Request methods include GET Return current value of resource run program HEAD Return the meta data associated with a resource POST Update resource provide input to a program Headers include Useful info for the server e g desired language 13 Server to Client Communication HTTP Response Message Status line protocol version status code status phrase Response headers provide information Body optional data status line protocol status code status phrase header lines HTTP 1 1 200 OK Connection close Date Thu 06 Aug 2006 12 00 15 GMT Server Apache 1 3 0 Unix Last Modified Mon 22 Jun 2006 Content Length 6821 Content Type text html blank line data e g requested HTML file data data data data data 14 Server to Client Communication HTTP Response Message Status line protocol version status code status phrase Response headers provide information Body optional data Response code classes Similar to other ASCII app protocols like SMTP Code 1xx 2xx 3xx 4xx 5xx Class Informational Success Redirection Client error Server error Example 100 Continue 200 OK 304 Not Modified 404 Not Found 503 Service Unavailable 15 Web Server Generating a Response Return a file Generate response dynamically URL matches a file e g www index html Server returns file as the response Server generates appropriate response header URL triggers a program on the server Server runs program and sends output to client Return meta data with no body 16 HTTP Resource Meta Data Meta data Info about a resource A separate entity Examples Size of a resource Last modification time Type of the content Data format classification e g Content Type text html Enables browser to automatically launch an appropriate viewer Borrowed from e mail s Multipurpose Internet Mail Extensions MIME Usage example Conditional GET Request Client requests object If modified since If object hasn t changed server returns HTTP 1 1 304 Not Modified No body in the server s response only a header 17 HTTP is Stateless Stateless protocol Each request response exchange treated independently Servers not required to retain state This is good Client Server How Improves scalability on the server side Don t have to retain info across requests Can handle higher rate of requests Order of requests doesn t matter This is bad Some applications need persistent state Need to uniquely identify user or store temporary info e g Shopping cart user
View Full Document