The World Wide Web EE 122 Intro to Communication Networks Fall 2006 MW 4 5 30 in Donner 155 Vern Paxson TAs Dilip Antony Joseph and Sukun Kim http inst eecs berkeley edu ee122 Materials with thanks to Jennifer Rexford Ion Stoica and colleagues at Princeton and UC Berkeley 1 Announcements Project 2 out Checkpoint due Weds Oct 18 Full project due Thurs Oct 26 2 1 Goals of Today s Lecture Finish Email retrieving mail from the server Main ingredients of the Web URIs HTML HTTP Key properties of HTTP Request response stateless and resource meta data Performance of HTTP Parallel connections persistent connections pipelining Web components Clients proxies and servers Caching vs replication 3 Retrieving E Mail From the Server Server stores incoming e mail by mailbox Based on the From field in the message Users need to retrieve e mail Variety of ways to do this Directly by same machine access to the mailbox Via Interactive Mail Access Protocol IMAP Supports concurrent access by multiple clients server side searchers partial MIME fetches multiple mailboxes Via HTTP Web E g GMail Via Post Office Protocol POP 4 2 POP3 Protocol Authorization phase Client commands user declare username pass password Server responses OK ERR Transaction phase client list list message numbers retr retrieve message by number dele delete quit S C S C S OK POP3 server ready user bob OK pass hungry OK user successfully logged on C S S S C S S C C S S C C S list 1 498 2 912 retr 1 message 1 contents dele 1 retr 2 message 1 contents dele 2 quit OK POP3 server signing off 5 The World Wide Web 6 3 Main Components URIs Uniform Resource Identifier URI Denotes a resource Could be its name Could be its location Which is better Uniform Resource Name URN Names a resource independent of how to get it E g urn ietf rfc 2396 is a standard URN for RFC 2396 Uniform Resource Locator URL Specifies how to access a resource E g ftp ftp rfc editor org in notes rfc2396 txt 7 URL Syntax protocol hostname port directorypath resource Protocol might be http ftp https smtp rtsp In practice hostname can instead be an IP address What does your browser maybe show for http 2850372702 Port defaults to the standard port associated w protocol E g 80 tcp for http 443 tcp for https Directory path is hierarchical often reflecting file system Can extend resource to program executions as well http us f413 mail yahoo com ym ShowLetter box 40B 40Bulk MsgId 2604 1744106 29699 1123 1261 0 28917 3552 128995 7100 Search Nhead f YY 31454 order down sort date pos 0 view a head b 8 4 Main Components HTML HyperText Markup Language HTML Representation of hypertext documents in ASCII format Format text reference images embed hyperlinks HREF Interpreted by Web browsers when rendering a page Straight forward to learn Can basically start with a plain text file Easy to add formatting references bullets etc Automatically generated by authoring programs Tools to aid users in creating HTML files Your browser likely can show a page s raw HTML Web page Base HTML file referenced objects e g images Each object has its own URL 9 Main Components HTTP HyperText Transfer Protocol HTTP Client server protocol for transferring resources Important properties of HTTP Request response protocol Reliance on a global URI namespace Resource metadata Stateless telnet www icir org 80 ASCII format GET vern HTTP 1 0 blank line i e CRLF 10 5 HTTP Request Message Request message sent by a client Request line method resource and protocol version Request headers provide information or modify request Body optional data e g to POST data to the server request line GET POST HEAD commands GET somedir page html HTTP 1 1 Host www someschool edu User agent Mozilla 4 0 header Connection close lines Accept language fr blank line Not optional Carriage return line feed indicates end of message 11 HTTP Response Message Response message sent by a server Status line protocol version status code status phrase Response headers provide information Body optional data status line protocol status code status phrase header lines data e g requested HTML file HTTP 1 1 200 OK Connection close Date Thu 06 Aug 2006 12 00 15 GMT Server Apache 1 3 0 Unix Last Modified Mon 22 Jun 2006 Content Length 6821 Content Type text html blank line data data data data data 12 6 Request Methods and Response Codes Request methods include GET return current value of resource run program HEAD return the meta data associated with a resource POST update a resource provide input to a program Response code classes 1xx informational e g 100 Continue 2xx success e g 200 OK 3xx redirection e g 304 Not Modified 4xx client error e g 404 Not Found 5xx server error e g 503 Service Unavailable Similar to other ASCII app protocols like SMTP FTP 13 HTTP Resource Meta Data Meta data Information relating to a resource but not part of the resource itself Examples of meta data Size of a resource Type of the content Last modification time Typing of content borrowed from email Multipurpose Internet Mail Extensions MIME Data format classification e g Content Type text html Enables browsers to automatically launch a viewer 14 7 Example Conditional GET Request Fetch resource only if it has changed at the server GET ee122 fa06 HTTP 1 1 Host inst eecs berkeley edu User Agent Mozilla 4 03 If Modified Since Sun 27 Aug 2006 22 25 50 GMT CRLF Server avoids wasting resources to send again Server inspects the last modified time of the resource and compares to the if modified since time Returns 304 Not Modified if resource has not changed or a 200 OK with the latest version otherwise 15 Stateless Operation Stateless protocol Each request response exchange treated independently Clients and servers not required to retain state Statelessness improves scalability Avoid need for server to retain info across requests Enable server to handle a higher rate of requests However some applications need persistent state To uniquely identify the user or store temporary info E g personalize a Web page compute profiles or access statistics by user track a shopping cart Done using cookies 16 8 Cookies Cookie Small state stored by client on behalf of server Included in future requests to the server Request Response Set Cookie XYZ Request Cookie XYZ 17 Web Components Clients Send requests and receive responses Browsers spiders and agents Servers Receive requests and send responses Store or generate the responses Proxies Act as a server for the client and a client to the server Perform extra
View Full Document