1Grand Convergence of Computing,Telecommunications & Multimedia1RodriguezOpen Issues in Content DistributionPablo RodriguezSystems and Networking DepartmentMicrosoft Research, Cambridge. [email protected] Problem• The Internet has been growing very fast, with a growing number of users accessing a growing amount of content– Servers and network links are overloaded–Users get frustratedTHE INTERNET2Grand Convergence of Computing,Telecommunications & Multimedia3RodriguezContent Distribution History... “With 25 years of Internet experience, we’ve learned exactly one way to deal with the exponential growth: Caching”. (1997, Van Jacobson)… and he was right, but with a different business model4RodriguezContent Distribution History...• Web Proxy Caching:–Caches save ISPs bandwidth, reduce clients latency, and avoid flash crowdsand bandwidth usage in the origin servers’ access link– Web caching gives good performance because very often» a single client repeatedly accesses the same document» a nearby client also accesses the same document– Cache Hit ratio increases logarithmically with number of usersINTERNETWeb CacheServersISP3Grand Convergence of Computing,Telecommunications & Multimedia5RodriguezContent Distribution History...INTERNETServersISPISPBackbone ISP6RodriguezWhat went wrong with Web Caches?• Web protocols evolved extensively to accommodate caching, e.g. HTTP 1.1• However, Web caching was developed with a strong ISP perspective, leaving content providers out of the picture– It is the ISP who places a cache and controls it– ISPs only interest to use Web caches is to reduce bandwidth• In the USA: Bandwidth was very cheap. – No interest for ISPs in Caching• In Europe, there were many more Web caches– However, ISPs can arbitrarily tune Web caches to deliver stale content• European Union tried to ban Web caching. Some US content providers started suing ISPs using Web caching…4Grand Convergence of Computing,Telecommunications & Multimedia7RodriguezContent Provider’s Point of View• Content providers care about– User experience latency– Content freshness–Avoid flash crowds– Minimize bandwidth usage in their access link– Accurate access statistics• In an ideal world, all ISPs would use cooperative caches with enough capacity, delivering fresh content, and reporting accurate access statistics• However, the real world is that many ISPs did not implement caching and the ones that did, abused of it– Content providers defeated caches (Pragma: No-cache) and started thinking about building infrastructures to deliver their content…8RodriguezContent Distribution History...• Some large content providers decided to use their network of Mirror Servers» But content providers prefer to outsource the distribution to a third party...» Plus, it is more cost effective (no need to dimension all systems for the pick)•Content Distribution Networks(CDNs) build an overlay networks of caches to provide fast, cost-effective, and reliable content delivery, while working tightly with content providers5Grand Convergence of Computing,Telecommunications & Multimedia9RodriguezWhy are CDNs important?• Content Distribution Networks:– Provide control over content– Bypass bottlenecks to reduce latency and provide more reliable performance– Offload servers from flash crowds– Provide economy of scale and reduce infrastructure and management cost (sharing)– Allow for more sophisticated Web content authoring– Eliminate needs to dimension all servers for pick (multiplexing)– Shield servers from denial of attacks– Provide application-level agreements• CDNs are used to:– Relieve end-user latency for the most important Web sites (e.g. CNN, Yahoo)– Minimize impact of flash crowd events (e.g. Olympics, US Open)– Provide significant bandwidth savings (e.g. 30-40%)– Distribute enterprise content (e.g. remote learning)10RodriguezContent Distribution MiddlewareTCP/IP LayerMAC LayerContent DistributionWeb CachingCache-SharingLoad BalancingMulticastSatelliteOverlaysP2PContentRedirectionError CorrectionTechniquesContentManagementWeb StreamingDistantLearningFile Swapping Games WirelessGridAppsCDNComponents6Grand Convergence of Computing,Telecommunications & Multimedia11RodriguezCDN Case Study: Akamai• Akamai (AH kuh my) is Hawaiian for intelligent, clever and informally “cool”. Founded Apr 99, Boston MA by MIT students• [Nasdaq: AKAM], had an explosive opening-day gain of 458.4 % on October 29th, 2000• Akamai can be considered to be the first CDN in the Internet (others also at the time: Sandpiper, Digital Island)• More than 1250 content providers use their network. 14000 servers in 40 countries– Still fewer countries than UN…• Delivers text/images as well as streaming of stored and life media. $2000 per Mbps/month. $300 for region-specific service12RodriguezAkamai: How it works?Origin ServerAkamai CachesIn-linedImages3)Akamai DNSName: www.foo.comIP: 192.12.12.52)DNS QueryName: aq12.akamai.net IP: ?DNS responseName: aq12.akamai.net IP: 225.123.25.10TTL: 10 secName: ag12.akamai.net IP: 211.123.25.10Process index.html1)HTTP 1.1 200 OKindex.html<html><img SRC=“http://ag12.akamai.net/HASH/image1.jpg"><img SRC=“http://ag12.akamai.net/HASH/image2.jpg"></html>GET http://www.foo.com/index.htmlPull/PushImages0)IP: 129.13.6.12IP: 115.13.5.247Grand Convergence of Computing,Telecommunications & Multimedia13RodriguezMore Akamai information• URL akamaization is becoming obsolete and only supported for legacy reasons– Currently most content providers prefer to use DNS CNAME techniques to get all their content served from the Akamai servers» Still content providers need to run their origin servers• Akamai Evolution:– Files/Streaming– Secure pages and whole pages– Dynamic Page assembly at the edge (ESI)– Distributed applications» First step is to replicate read-only databases14RodriguezAkamai: How to avoid flash crowds?Origin ServerAkamai CachesName: www.foo.comIP: 192.12.12.58Grand Convergence of Computing,Telecommunications & Multimedia15RodriguezAkamai: How to avoid flash crowds?Origin ServerAkamai CachesName: www.foo.comIP: 192.12.12.5Hierarchical Caching16RodriguezCDN Challenges….• Distributing Web content has proven to be an easier problem than expected– Bandwidth and servers are becoming cheaper and cheaper– P2P is a becoming a serious alternative to using CDNs• Multimedia content
View Full Document