1 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats http://inst.eecs.berkeley.edu/~ee122/ (Materials with thanks to Vern Paxson, Jennifer Rexford, and colleagues at UC Berkeley) 2 Goals of Today’s Lecture Main ingredients of the Web URIs, HTML, HTTP Key properties of HTTP Request-response, stateless, and resource meta-data Performance of HTTP Parallel connections, persistent connections, pipelining Web components Clients, proxies, and servers Caching vs. replication 3 The Web – History (I) 1945: Vannevar Bush, Memex: "a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility" Vannevar Bush (1890-1974) Memex (See http://www.iath.virginia.edu/elab/hfl0051.html) 4 The Web – History (II) 1967, Ted Nelson, Xanadu: A world-wide publishing network that would allow information to be stored not as separate files but as connected literature Owners of documents would be automatically paid via electronic means for the virtual copying of their documents Coined the term “Hypertext” Ted Nelson 5 The Web – History (III) World Wide Web (WWW): a distributed database of “pages” linked through Hypertext Transport Protocol (HTTP) First HTTP implementation - 1990 Tim Berners-Lee at CERN HTTP/0.9 – 1991 Simple GET command for the Web HTTP/1.0 –1992 Client/Server information, simple caching HTTP/1.1 - 1996 Tim Berners-Lee 6 Web Components Content Objects Clients Send requests / Receive responses Servers Receive requests / Send responses Store or generate the responses Proxies Placed between clients and servers Act as a server for the client, and a client to the server Provide extra functions Caching, anonymization, logging, transcoding, filtering access Explicit or transparent (“interception”)2 7 HTML Content: How? A Web page has several components Base HTML file Referenced objects (e.g., images) HyperText Markup Language (HTML) Representation of hypertext documents in ASCII format Web browsers interpret HTML when rendering a page Several functions: Format text, reference images, embed hyperlinks (HREF) Straight-forward to learn Syntax easy to understand Authoring programs can auto-generate HTML Source almost always available 8 URI Content: How? Uniform Resource Identifier (URI) Uniform Resource Locator (URL) Provides a means to get the resource http://www.ietf.org/rfc/rfc3986.txt! Uniform Resource Name (URN) Names a resource independent of how to get it urn:ietf:rfc:3986 is a standard URN for RFC 3986 9 URL Syntax Content: How? protocol://hostname[:port]/directorypath/resource!(e.g., http://inst.eecs.berkeley.edu/~ee122/fa08/index.html)!protocol http, ftp, https, smtp, rtsp, etc. hostname Fully Qualified Domain Name (FQDN), IP address port Defaults to protocol’s standard port e.g. http: 80/tcp https: 443/tcp directory path Hierarchical, often reflecting file system resource Identifies the desired resource Can also extend to program executions: http://us.f413.mail.yahoo.com/ym/ShowLetter?box=%40B%40Bulk&MsgId=2604_1744106_29699_1123_1261_0_28917_3552_1289957100&Search=&Nhead=f&YY=31454&order=down&sort=date&pos=0&view=a&head=b 10 HTTP Client-Server: How? HyperText Transfer Protocol (HTTP) Client-server protocol for transferring resources Important properties: Request-response protocol Resource metadata Stateless ASCII format % telnet www.cs.berkeley.edu 80!GET /istoica/ HTTP/1.0!<blank line, i.e., CRLF> 11 HTTP Big Picture Client Server Request image 1 Transfer image 1 Request image 2 Transfer image 2 Request text Transfer text Finish display page 12 GET /somedir/page.html HTTP/1.1 Host: www.someschool.edu User-agent: Mozilla/4.0 Connection: close Accept-language: fr (blank line) Client-to-Server Communication HTTP Request Message Request line: method, resource, and protocol version Request headers: provide information or modify request Body: optional data (e.g., to “POST” data to the server) request line header lines carriage return line feed indicates end of message3 13 Client-to-Server Communication HTTP Request Message Request line: method, resource, and protocol version Request headers: provide information or modify request Body: optional data (e.g., to “POST” data to the server) Request methods include: GET: Return current value of resource, run program, … HEAD: Return the meta-data associated with a resource POST: Update resource, provide input to a program, … Headers include: Useful info for the server (e.g. desired language) 14 Server-to-Client Communication HTTP Response Message Status line: protocol version, status code, status phrase Response headers: provide information Body: optional data HTTP/1.1 200 OK Connection close Date: Thu, 06 Aug 2006 12:00:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 2006 ... Content-Length: 6821 Content-Type: text/html (blank line) data data data data data ... status line (protocol, status code, status phrase) header lines data e.g., requested HTML file 15 Server-to-Client Communication HTTP Response Message Status line: protocol version, status code, status phrase Response headers: provide information Body: optional data Response code classes Similar to other ASCII app. protocols like SMTP Code Class Example 1xx Informational 100 Continue!2xx Success 200 OK!3xx Redirection 304 Not Modified!4xx Client error 404 Not Found!5xx Server error 503 Service Unavailable!16 Web Server: Generating a Response Return a file URL matches a file (e.g., /www/index.html) Server returns file as the response Server generates appropriate response header Generate response dynamically URL triggers a program on the server Server runs program and sends output to client Return meta-data with no body 17
View Full Document