DOC PREVIEW
MTU CS 6461 - An Analysis of Internet Content Delivery Systems

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna R Gummadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Department of Computer Science & Engineering University of Washington {t zoompy, gummadi, rdunn, gribble, levy}@cs, washington, edu Abstract In the span of only a few years, the Internet has expe- rienced an astronomical increase in the use of specialized content delivery systems, such as content delivery networks and peer-to-peer file sharing systems. Therefore, an under- standing of content delivery on the lnternet now requires a detailed understanding of how these systems are used in practice. This paper examines content delivery from the point of view of four content delivery systems: HTTP web traffic, the Akamai content delivery network, and Kazaa and Gnutella peer-to-peer file sharing traffic. We collected a trace of all incoming and outgoing network traffic at the University of Washington, a large university with over 60,000 students, faculty, and staff. From this trace, we isolated and char- acterized traffic belonging to each of these four delivery classes. Our results (1) quanti~, the rapidly increasing im- portance of new content delivery systems, particularly peer- to-peer networks, (2) characterize the behavior of these sys- tems from the perspectives of clients, objects, and servers, and (3) derive implications for caching in these systems. 1 Introduction Few things compare with the growth of the Internet over the last decade, except perhaps its growth in the last several years. A key challenge for Internet infrastructure has been delivering increasingly complex data to a voracious and growing user population. The need to scale has led to the development of thousand-node clusters, global-scale con- tent delivery networks, and more recently, self-managing peer-to-peer structures. These content delivery mechanisms are rapidly changing the nature of Internet content delivery and traffic; therefore, an understanding of the modern Inter- net requires a detailed understanding of these new mecha- nisms and the data they serve. This paper examines content delivery by focusing on four content delivery systems: HTTP web traffic, the Aka- mai content delivery network, and the Kazaa and Gnutella peer-to-peer file sharing systems. To perform the study, we traced all incoming and outgoing Internet traffic at the Uni- versity of Washington, a large university with over 60,000 students, faculty, and staff. For this paper, we analyze a nine day trace that saw over 500 million transactions and over 20 terabytes of HTTP data. From this data, we pro- vide a detailed characterization and comparison of content delivery systems, and in particular, the latest peer-to-peer workloads. Our results quantify: (1) the extent to which peer-to-peer traffic has overwhelmed web traffic as a lead- ing consumer of Internet bandwidth, (2) the dramatic differ- ences in the characteristics of objects being transferred as a result, (3) the impact of the two-way nature of peer-to-peer communication, and (4) the ways in which peer-to-peer sys- tems are not scaling, despite their explicitly scalable design. For example, our measurements show that an average peer of the Kazaa peer-to-peer network consumes 90 times more bandwidth than an average web client in our environment. Overall, we present important implications for large organi- zations, service providers, network infrastructure, and gen- eral content delivery. The paper is organized as follows. Section 2 presents an overview of the content delivery systems examined in this paper, as well as related work. Section 3 describes the measurement methodology we used to collect and process our data. In Section 4 we give a high-level overview of the workload we have traced at the University of Washing- ton. Section 5 provides a detailed analysis of our trace from the perspective of objects, clients, and servers, focusing in particular on a comparison of peer-to-peer and web traffic. Section 6 evaluates the potential for caching in content de- livery networks and peer-to-peer networks, and Section 7 concludes and summarizes our results. 2 Overview of Content Delivery Systems Three dominant content delivery systems exist today: the client/server oriented world-wide web, content delivery net- works, and peer-to-peer file sharing systems. At a high level, these systems serve the same role of distributing con- tent to users. However, the architectures of these systems differ significantly, and the differences affect their perfor- mance, their workloads, and the role caching can play. In this section, we present the architectures of these systems and describe previous studies of their behavior. USENIX Association 5th Symposium on Operating Systems Design and Implementation 3152.1 The World-Wide Web (WWW) The basic architecture of the web is simple: using the HTTP [16] protocol, web clients running on users' ma- chines request objects from web servers. Previous stud- ies have examined many aspects of the web, including web workloads [2, 8, 15, 29], characterizing web ob- jects [3, 11], and even modeling the hyperlink structure of the web [6, 21]. These studies suggest that most web ob- jects are small (5-10KB), but the distribution of object sizes is heavy-tailed and very large objects exist. Web objects are accessed with a Zipf popularity distribution, as are web servers. The number of web objects is enormous (in the bil- lions) and rapidly growing; most web objects are static, but an increasing number are generated dynamically. The HTTP protocol includes provisions for consistency management. HTTP headers include caching pragmas that affect whether or not an object may be cached, and if so, for how long. Web caching helps to alleviate load on servers and backbone links, and can also serve to de- crease object access latencies. Much research has focused on Web proxy caching [4, 5, 7, 11, 12] and, more recently, on coordinating state among multiple, cooperating proxy caches [13, 30, 33]; some of these proposals aim to cre- ate global caching structures [27, 34]. The results of these studies generally indicate that cache hit rates of 40-50% are achievable, but that hit rate increases only logarithmi- cally with client population [36] and is constrained by the increasing amount of


View Full Document

MTU CS 6461 - An Analysis of Internet Content Delivery Systems

Documents in this Course
Tapestry

Tapestry

13 pages

Load more
Download An Analysis of Internet Content Delivery Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view An Analysis of Internet Content Delivery Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view An Analysis of Internet Content Delivery Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?