15-441 Computer Networking Caching, CDN, Consistent Hashing, P2PWeb historyWeb history (cont)Typical Workload (Web Pages)Web Proxy CachesNo Caching Example (1)No Caching Example (2)W/Caching Example (3)HTTP CachingExample Cache Check RequestExample Cache Check ResponseProblemsCaching Proxies – Sources for MissesContent Distribution Networks (CDNs)Slide 15Content Distribution Networks & Server SelectionServer SelectionApplication BasedNaming BasedHow Akamai WorksSlide 21Slide 22Slide 23Akamai – Subsequent RequestsSimple HashingConsistent HashConsistent Hash – ExampleConsistent HashingSlide 29IdentifiersConsistent Hashing ExampleConsistent Hashing PropertiesSlide 33Load BalanceConsistent Hashing not just for CDNChord: Design GoalsLookups strategiesReducing Lookups: Finger TablesFaster LookupsSummary of Performance ResultsJoining the RingJoin: Initialize New Node’s Finger TableJoin: Update Fingers of Existing NodesJoin: Transfer KeysHandling FailuresSlide 46Joining/Leaving overheadSummary15-441 Computer NetworkingCaching, CDN, Consistent Hashing, P2P2Web history•1945: Vannevar Bush, “As we may think”, Atlantic Monthly, July, 1945.•describes the idea of a distributed hypertext system.•a “memex” that mimics the “web of trails” in our minds.•1989: Tim Berners-Lee (CERN) writes internal proposal to develop a distributed hypertext system•connects “a web of notes with links”.•intended to help CERN physicists in large projects share and manage information •1990: Tim BL writes graphical browser for Next machines.15-441 S'10Lecture 21: CDN/Hashing/P2P3Web history (cont)•1992•NCSA server released•26 WWW servers worldwide•1993•Marc Andreessen releases first version of NCSA Mosaic Mosaic version released for (Windows, Mac, Unix).•Web (port 80) traffic at 1% of NSFNET backbone traffic.•Over 200 WWW servers worldwide.•1994•Andreessen and colleagues leave NCSA to form "Mosaic Communications Corp" (Netscape). 15-441 S'10Lecture 21: CDN/Hashing/P2P4Typical Workload (Web Pages)•Multiple (typically small) objects per page •File sizes•Heavy-tailed•Pareto distribution for tail•Lognormal for body of distribution•Embedded references•Number of embedded objects also paretoPr(X>x) = (x/xm)-k•This plays havoc with performance. Why?•Solutions?15-441 S'10Lecture 21: CDN/Hashing/P2P•Lots of small objects means & TCP•3-way handshake•Lots of slow starts•Extra connection state5Web Proxy Caches•User configures browser: Web accesses via cache•Browser sends all HTTP requests to cache•Object in cache: cache returns object •Else cache requests object from origin server, then returns object to clientclientProxyserverclientHTTP requestHTTP requestHTTP responseHTTP responseHTTP requestHTTP responseorigin serverorigin server15-441 S'106No Caching Example (1)Assumptions•Average object size = 100,000 bits•Avg. request rate from institution’s browser to origin servers = 15/sec•Delay from institutional router to any origin server and back to router = 2 secConsequences•Utilization on LAN = 15%•Utilization on access link = 100%•Total delay = Internet delay + access delay + LAN delay = 2 sec + minutes + millisecondsoriginserverspublic Internetinstitutionalnetwork10 Mbps LAN1.5 Mbps access link15-441 S'107No Caching Example (2)Possible solution•Increase bandwidth of access link to, say, 10 Mbps•Often a costly upgradeConsequences•Utilization on LAN = 15%•Utilization on access link = 15%•Total delay = Internet delay + access delay + LAN delay = 2 sec + msecs + msecsoriginserverspublic Internetinstitutionalnetwork10 Mbps LAN10 Mbps access link15-441 S'108W/Caching Example (3)Install cache•Suppose hit rate is .4Consequence•40% requests will be satisfied almost immediately (say 10 msec)•60% requests satisfied by origin server•Utilization of access link reduced to 60%, resulting in negligible delays•Weighted average of delays = .6*2 sec + .4*10msecs < 1.3 secsoriginserverspublic Internetinstitutionalnetwork10 Mbps LAN1.5 Mbps access linkinstitutionalcache15-441 S'109HTTP Caching•Clients often cache documents•Challenge: update of documents•If-Modified-Since requests to check•HTTP 0.9/1.0 used just date•HTTP 1.1 has an opaque “entity tag” (could be a file signature, etc.) as well•When/how often should the original be checked for changes?•Check every time?•Check each session? Day? Etc?•Use Expires header•If no Expires, often use Last-Modified as estimate15-441 S'10Lecture 21: CDN/Hashing/P2P10Example Cache Check RequestGET / HTTP/1.1Accept: */*Accept-Language: en-usAccept-Encoding: gzip, deflateIf-Modified-Since: Mon, 29 Jan 2001 17:54:18 GMTIf-None-Match: "7a11f-10ed-3a75ae4a"User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)Host: www.intel-iris.netConnection: Keep-Alive15-441 S'10Lecture 21: CDN/Hashing/P2P11Example Cache Check ResponseHTTP/1.1 304 Not ModifiedDate: Tue, 27 Mar 2001 03:50:51 GMTServer: Apache/1.3.14 (Unix) (Red-Hat/Linux) mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.2 PHP/4.0.1pl2 mod_perl/1.24Connection: Keep-AliveKeep-Alive: timeout=15, max=100ETag: "7a11f-10ed-3a75ae4a"15-441 S'10Lecture 21: CDN/Hashing/P2P12Problems•Over 50% of all HTTP objects are uncacheable – why?•Not easily solvable•Dynamic data stock prices, scores, web cams•CGI scripts results based on passed parameters•Obvious fixes•SSL encrypted data is not cacheable•Most web clients don’t handle mixed pages well many generic objects transferred with SSL•Cookies results may be based on passed data•Hit metering owner wants to measure # of hits for revenue, etc.•What will be the end result?15-441 S'10Lecture 21: CDN/Hashing/P2P13Caching Proxies – Sources for Misses•Capacity•How large a cache is necessary or equivalent to infinite•On disk vs. in memory typically on disk•Compulsory•First time access to document•Non-cacheable documents•CGI-scripts•Personalized documents (cookies, etc)•Encrypted data (SSL)•Consistency•Document has been updated/expired before reuse•Conflict•No such misses15-441 S'10Lecture 21: CDN/Hashing/P2P14Content Distribution Networks (CDNs)•The content providers are the CDN customers.Content replication•CDN company installs hundreds of CDN servers throughout Internet•Close to users•CDN replicates its customers’ content in CDN servers. When provider updates content, CDN updates serversorigin
View Full Document