11The World Wide WebEE 122: Intro to Communication NetworksFall 2006 (MW 4-5:30 in Donner 155)Vern PaxsonTAs: Dilip Antony Joseph and Sukun Kimhttp://inst.eecs.berkeley.edu/~ee122/Materials with thanks to Jennifer Rexford, Ion Stoica,and colleagues at Princeton and UC Berkeley2Announcements• Project #2 out– Checkpoint due Weds Oct 18– Full project due Thurs Oct 263Goals of Today’s Lecture• (Finish Email - retrieving mail from the server)• Main ingredients of the Web– URIs, HTML, HTTP• Key properties of HTTP– Request-response, stateless, and resource meta-data• Performance of HTTP– Parallel connections, persistent connections, pipelining• Web components– Clients, proxies, and servers– Caching vs. replication4Retrieving E-Mail From the Server• Server stores incoming e-mail by mailbox– Based on the “From” field in the message• Users need to retrieve e-mail• Variety of ways to do this:– Directly by same-machine access to the mailbox– Via Interactive Mail Access Protocol (IMAP) Supports concurrent access by multiple clients, server-sidesearchers, partial MIME fetches, multiple mailboxes– Via HTTP (Web) E.g., GMail– Via Post Office Protocol (POP)5POP3 ProtocolAuthorization phase• Client commands:– user: declare username– pass: password• Server responses– +OK– -ERRTransaction phase, client:• list: list message numbers• retr: retrieve message bynumber• dele: delete• quitS: +OK POP3 server ready C: user bob S: +OK C: pass hungry S: +OK user successfully logged on C: list S: 1 498 S: 2 912 S: . C: retr 1 S: <message 1 contents> S: . C: dele 1 C: retr 2 S: <message 1 contents> S: . C: dele 2 C: quit S: +OK POP3 server signing off6The World Wide Web27Main Components: URIs• Uniform Resource Identifier (URI)– Denotes a resource– Could be its name– Could be its location– Which is better?• Uniform Resource Name (URN)– Names a resource independent of how to get it– E.g., urn:ietf:rfc:2396 is a standard URN for RFC 2396• Uniform Resource Locator (URL)– Specifies how to access a resource– E.g., ftp://ftp.rfc-editor.org/in-notes/rfc2396.txt8URL Syntaxprotocol://hostname[:port]/directorypath/resource• Protocol might be http, ftp, https, smtp, rtsp, …• In practice, hostname can instead be an IP address– What does your browser (maybe) show for http://2850372702/ ?• Port defaults to the standard port associated w/ protocol– E.g., 80/tcp for http, 443/tcp for https• Directory path is hierarchical, often reflecting file system• Can extend resource to program executions as well…– http://us.f413.mail.yahoo.com/ym/ShowLetter?box=%40B%40Bulk&MsgId=2604_1744106_29699_1123_1261_0_28917_3552_1289957100&Search=&Nhead=f&YY=31454&order=down&sort=date&pos=0&view=a&head=b9Main Components: HTML• HyperText Markup Language (HTML)– Representation of hypertext documents in ASCII format– Format text, reference images, embed hyperlinks (HREF)– Interpreted by Web browsers when rendering a page• Straight-forward to learn– Can basically start with a plain text file Easy to add formatting, references, bullets, etc.– Automatically generated by authoring programs Tools to aid users in creating HTML files– Your browser likely can show a page’s raw HTML• Web page– Base HTML file referenced objects (e.g., images)– Each object has its own URL10Main Components: HTTP• HyperText Transfer Protocol (HTTP)–Client-server protocol for transferring resources• Important properties of HTTP–Request-response protocol–Reliance on a global URI namespace–Resource metadata–Stateless–ASCII format% telnet www.icir.org 80GET /vern/ HTTP/1.0<blank line, i.e., CRLF>11HTTP Request Message• Request message sent by a client– Request line: method, resource, and protocol version– Request headers: provide information or modify request– Body: optional data (e.g., to “POST” data to the server)GET /somedir/page.html HTTP/1.1Host: www.someschool.edu User-agent: Mozilla/4.0Connection: close Accept-language:fr blank line request line(GET, POST, HEAD commands)header linesCarriage return, line feed indicates end of messageNot optional12HTTP Response Message• Response message sent by a server– Status line: protocol version, status code, status phrase– Response headers: provide information– Body: optional dataHTTP/1.1 200 OK Connection closeDate: Thu, 06 Aug 2006 12:00:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 2006 …... Content-Length: 6821 Content-Type: text/htmlblank line data data data data data ... status line(protocolstatus codestatus phrase)header linesdata, e.g., requestedHTML file313Request Methods and Response Codes• Request methods include– GET: return current value of resource, run program, …– HEAD: return the meta-data associated with a resource– POST: update a resource, provide input to a program, …• Response code classes– 1xx: informational (e.g., “100 Continue”)– 2xx: success (e.g., “200 OK”)– 3xx: redirection (e.g., “304 Not Modified”)– 4xx: client error (e.g., “404 Not Found”)– 5xx: server error (e.g., “503 Service Unavailable”)– (Similar to other ASCII app. protocols like SMTP, FTP)14HTTP Resource Meta-Data• Meta-data– Information relating to a resource– … but not part of the resource itself• Examples of meta-data– Size of a resource– Type of the content– Last modification time• Typing of content borrowed from email– Multipurpose Internet Mail Extensions (MIME)– Data format classification (e.g., Content-Type: text/html)– Enables browsers to automatically launch a viewer15Example: Conditional GET Request• Fetch resource only if it has changed at the server• Server avoids wasting resources to send again– Server inspects the “last modified” time of the resource– … and compares to the “if-modified-since” time– Returns “304 Not Modified” if resource has not changed– …. or a “200 OK” with the latest version otherwiseGET /~ee122/fa06/ HTTP/1.1Host: inst.eecs.berkeley.eduUser-Agent: Mozilla/4.03If-Modified-Since: Sun, 27 Aug 2006 22:25:50 GMT<CRLF>16Stateless Operation• Stateless protocol– Each request-response exchange treated independently– Clients and servers not required to retain state• Statelessness improves scalability– Avoid need for server to retain info across requests–
View Full Document