Duke CPS 212 - Web Caching and Content Delivery - D1734216

Home> Schools> Duke University> (CPS) > CPS 212> Web Caching and Content Delivery

DOC PREVIEW

Duke CPS 212 - Web Caching and Content Delivery

School name Duke University

Course Cps 212- Distributed Information Systems

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Web Caching and Content DeliveryCaching for a Better WebProxy CachingIssues for Web CachingEnd-to-End Content DeliveryProxy Cache EffectivenessWeb Traffic CharacterizationZipfZipf-like Reference DistributionsImportance of Traffic ModelsThe “Trickle-Down Effect”A Look at the Miss StreamEffect on Server Trace (ibm.com)What’s Happening? (LRU)Miss Stream Probability by PopularityObject Hit Ratio by Popularity (1)Object Hit Ratio by Popularity (2)Limitations/Features of This StudyProxy Deployment and UseInterception SwitchesShouldn’t This Be Illegal?Cache EffectivenessWeb Caching and Content DeliveryWeb Caching and Content DeliveryCaching for a Better WebCaching for a Better WebPerformance is a major concern in the WebProxy caching is the most widely used method to improve Web performance•Duplicate requests to the same document served from cache•Hits reduce latency, bandwidth demand, server load•Misses increase latency (extra hops)Clients Proxy Cache ServersHitsMissesMissesInternet[Source: Geoff Voelker]Proxy CachingProxy CachingHow should we build caching systems for the Web?•Seminal paper [Chankhunthod96]•Proxy caches [Duska97]•Akamai DNS interposition [Karger99]•Cooperative caching [Tewari99, Fan98, Wolman99]•Popularity distributions [Breslau99]•Proxy filtering and transcoding [Fox et al]•Consistency [Tewari,Cao et al]•Replica placement for CDNs [et al][Voelker]Issues for Web CachingIssues for Web Caching•Binding clients to proxies, handling failoverManual configuration, router-based “transparent caching”, WPAD (Web Proxy Automatic Discovery)•Proxy may confuse/obscure interactions between server and client.•Consistency managementAt first approximation the Web is a wide-area read-only file service...but it is much more than that.caching responses vs. caching documentsdeltas [Mogul+Bala/Douglis/Misha/[email protected]]•Prefetching, scale, request routing, scale, performanceWeb caching vs. content distribution (CDNs, e.g., Akamai)End-to-End Content Delivery End-to-End Content Delivery request streamInternethosting networkrequestdistributorsurrogate cachesCDN serversproxiesserver array + storageupstream downstreamProxy Cache EffectivenessProxy Cache EffectivenessHow to measure Web cache effectiveness (goals)?•Hit ratio•Savings in bandwidth or server load•Reduction in perceived user latencyWhat factors determine/limit effectiveness?•Capacity?•User population?•Proxy placement in the network?•Updates and invalidations?Web Traffic CharacterizationWeb Traffic CharacterizationResearch question: how do goals and traffic behavior shape strategies for deploying and managing proxy caches?•Replacement policy: what objects to retain in cache?Large vs. small, relative importance of popularity and stability•Deployment: where to place the cache?Close to server or client?•How many users per cache?•Prefetching?Since the Web is in active deployment on a large-scale, Web traffic characterization is an empirical science.•Science of mass behavior: observe and test hypotheses.ZipfZipf[Breslau/Cao99] and others observed that Web accesses can be modeled using Zipf-like probability distributions.•Rank objects by popularity: lower rank i ==> more popular.•The probability that any given reference is to the ith most popular object is piNot to be confused with pc, the percentage of cacheable objects.Zipf says: “pi is proportional to 1/i, for some  with 0 <  < 1”.•Higher  gives more skew: popular objects are way popular.•Lower  gives a more heavy-tailed distribution.•In the Web,  ranges from 0.6 to 0.8 [Breslau/Cao99].•With =0.8, 0.3% of the objects get 40% of requests.Zipf-like Reference DistributionsZipf-like Reference Distributionspi  1/ipi = 1 Probability of access to the object with popularity rank i:(This is equivalent to a power-law or Pareto distribution.)alpha-0.7such that:headtail[Zipf 49, Duska et al. 97, Breslau et al. 98]Popularity rankheavy tailpiImportance of Traffic ModelsImportance of Traffic ModelsAnalytical models like this help us to predict cache hit ratios (object hit ratio or byte hit ratio).•E.g., get object hit ratio as a function of size by integrating under segments of the Zipf curve…assuming perfect LFU replacement•Must consider update rateDo object update rates correlate with popularity?•Must consider object sizeHow does size correlate with popularity?•Must consider proxy cache populationWhat is the probability of object sharing?•Enables construction of synthetic load generatorsSURGE [Barford and Crovella 99]The “Trickle-Down Effect”The “Trickle-Down Effect”clientscacheto serversflood trickleWhat is the effect on “downstream” traffic? What is the significance of this effect?How does it impact design choices for components “behind” the caches?A Look at the Miss StreamA Look at the Miss Streamsynthetic traceSURGE-generatedlow locality: = 0.6log-log plothead: flattenedmidrange: taperstail: intactZipf-like10358161998 ibm.comhigh localityfit Zipf = 0.76skewed: 77 % / 1% Effect on Server Trace (Effect on Server Trace (ibm.com)ibm.com)What’s Happening? (LRU)What’s Happening? (LRU)Suppose the cache fills up in R references.(That’s a property of the trace and the cache size.)Then a cache miss on object with rank i occurs only if i is referenced….probability pi…and i has not been referenced in the last R requests.probability (1 - pi)RStack distanceP(a miss is to object i) is qi = pi(1 - pi)RMiss Stream Probability by PopularityMiss Stream Probability by Popularityqi: R = 104, =0.7 IBM 1998 (32 MB)Moderately popular objects now dominate.Object Hit Ratio by Popularity (1)Object Hit Ratio by Popularity (1)synthetic = 0.6Object Hit Ratio by Popularity (2)Object Hit Ratio by Popularity (2)IBM1998Limitations/Features of This StudyLimitations/Features of This Studystatic (cacheable) objectsignore misses caused by updates •invalidation/expirationLRU replacementvary cache effectiveness by capacity•cache intercepts all client trafficignore effect on downstream traffic volumeProxy Deployment and UseProxy Deployment and UseWhere to put it?How to direct user Web traffic through the proxy?Request redirection•Much more to come on this topic…Must the server consent?•Protected content•Client identity“Transparent” caching and the end-to-end principle•Must the client consent?Interception SwitchesInterception SwitchesISP cache

View Full Document