The World Wide Web EE 122 Intro to Communication Networks Fall 2007 WF 4 5 30 in Cory 277 Lisa Fowler Vern Paxson TAs Lisa Fowler Daniel Killebrew Jorge Ortiz http inst eecs berkeley edu ee122 Materials with thanks to Vern Paxson Jennifer Rexford Ion Stoica and colleagues at Princeton and UC Berkeley 1 Announcements Project 1 Milestone 1 Due 11pm Tonight No slip days Homework 2 Out now Hard deadline of 11PM Tues Oct 9 First Midterm coming soon Fri Oct 12 2 1 Goals of Today s Lecture Main ingredients of the Web URIs HTML HTTP Key properties of HTTP Request response stateless and resource meta data Performance of HTTP Parallel connections persistent connections pipelining Web components Clients proxies and servers Caching vs replication 3 The World Wide Web 4 2 Web Components Content Objects Clients Send requests Receive responses Servers Receive requests Send responses Store or generate the responses Proxies Placed between clients and servers Act as a server for the client and a client to the server Provide extra functions Caching anonymization logging transcoding filtering access Explicit or transparent interception HTML 5 Content How A Web page has several components Base HTML file Referenced objects e g images HyperText Markup Language HTML Representation of hypertext documents in ASCII format Web browsers interpret HTML when rendering a page Several functions Format text reference images embed hyperlinks HREF Straight forward to learn Syntax easy to understand Authoring programs can auto generate HTML Source almost always available 6 3 URI Content How Uniform Resource Identifier URI Uniform Resource Locator URL Provides a means to get the resource http www ietf org rfc rfc3986 txt Uniform Resource Name URN Names a resource independent of how to get it urn ietf rfc 3986 is a standard URN for RFC 3986 7 URL Syntax Content How protocol hostname port directorypath resource protocol http ftp https smtp rtsp etc hostname FQDN IP address port Defaults to protocol s standard port e g http 80 tcp https 443 tcp directory path Hierarchical often reflecting file system resource Identifies the desired resource Can also extend to program executions http us f413 mail yahoo com ym ShowLetter box 4 0B 40Bulk MsgId 2604 1744106 29699 1123 1261 0 289 17 3552 1289957100 Search Nhead f YY 31454 order down sort date pos 0 view a head b 8 4 HTTP Client Server How HyperText Transfer Protocol HTTP Client server protocol for transferring resources Important properties Request response protocol Reliance on a global URI namespace Resource metadata Stateless telnet www icir org 80 ASCII format GET vern HTTP 1 0 blank line i e CRLF 9 Client to Server Communication HTTP Request Message Request line method resource and protocol version Request headers provide information or modify request Body optional data e g to POST data to the server request line GET somedir page html HTTP 1 1 Host www someschool edu header User agent Mozilla 4 0 lines Connection close Accept language fr blank line Not optional carriage return line feed indicates end of message 10 5 Client to Server Communication HTTP Request Message Request line method resource and protocol version Request headers provide information or modify request Body optional data e g to POST data to the server Request methods include GET Return current value of resource run program HEAD Return the meta data associated with a resource POST Update resource provide input to a program Headers include Useful info for the server e g desired language 11 Server to Client Communication HTTP Response Message Status line protocol version status code status phrase Response headers provide information Body optional data status line protocol status code status phrase header lines HTTP 1 1 200 OK Connection close Date Thu 06 Aug 2006 12 00 15 GMT Server Apache 1 3 0 Unix Last Modified Mon 22 Jun 2006 Content Length 6821 Content Type text html blank line data e g requested HTML file data data data data data 12 6 Server to Client Communication HTTP Response Message Status line protocol version status code status phrase Response headers provide information Body optional data Response code classes Similar to other ASCII app protocols like SMTP Code Class Example 1xx Informational 100 Continue 2xx Success 200 OK 3xx Redirection 304 Not Modified 4xx Client error 404 Not Found 5xx Server error 503 Service Unavailable 13 Web Server Generating a Response Return a file URL matches a file e g www index html Server returns file as the response Server generates appropriate response header Generate response dynamically URL triggers a program on the server Server runs program and sends output to client Return meta data with no body 14 7 HTTP Resource Meta Data Meta data Info about a resource A separate entity Examples Size of a resource Last modification time Type of the content Data format classification e g Content Type text html Enables browser to automatically launch an appropriate viewer Borrowed from e mail s Multipurpose Internet Mail Extensions MIME Usage example Conditional GET Request Client requests object If modified since If object hasn t changed server returns HTTP 1 1 304 Not Modified No body in the server s response only a header HTTP is Stateless 15 Client Server How Stateless protocol Each request response exchange treated independently Servers not required to retain state This is good Improves scalability on the server side Don t have to retain info across requests Can handle higher rate of requests Order of requests doesn t matter This is bad Some applications need persistent state Need to uniquely identify user or store temporary info e g Shopping cart user preferences and profiles usage tracking 16 8 State in a Stateless Protocol Cookies Client side state maintenance Client stores small state on behalf of server Client sends state in future requests to the server Can provide authentication Request Response Set Cookie XYZ Request Cookie XYZ 17 State in a Stateless Protocol HTTP Authentication Tool to limit access to server documents Basic HTTP Authentication Client can add an Authorization header to GET request Base64 encoded concatenation of username a colon password If client doesn t provide header server responds with a 401 Unauthorized and a WWW Authenticate header Server does not honor request until valid authorization received Stateless Must happen on each request Is this secure Is this security No Authentication is not security but provides a piece 18 9 Security
View Full Document