CMU INI 14740 - liang2005 (25 pages)

Previewing pages 1, 2, 24, 25 of 25 page document View the full content.
View Full Document

liang2005



Previewing pages 1, 2, 24, 25 of actual document.

View the full content.
View Full Document
View Full Document

liang2005

88 views


Pages:
25
School:
Carnegie Mellon University
Course:
Ini 14740 - Fundamentals of Telecommunications Networks
Unformatted text preview:

The KaZaA Overlay A Measurement Study Jian Liang Rakesh Kumar Keith W Ross Department of Computer and Department of Electrical and Department of Computer and Information Science Computer Engineering Information Science Polytechnic University Polytechnic University Polytechnic University Brooklyn NY USA 11201 Brooklyn NY USA 11201 Brooklyn NY USA 11201 Email jliang cis poly edu Email rkumar04 utopia poly edu Email ross poly edu September 15 2004 Abstract Both in terms of number of participating users and in traffic volume KaZaA is one of the most important applications in the Internet today Nevertheless because KaZaA is proprietary and uses encryption little is understood about KaZaA s overlay structure and dynamics its messaging protocol and its index management We have built two measurement apparatus the KaZaA Sniffing Platform and the KaZaA Probing Tool to unravel many of the mysteries behind KaZaA We deploy the apparatus to study KaZaA s overlay structure and dynamics its neighbor selection its use of dynamic port numbers to circumvent firewalls and its index management Although this study does not fully solve the KaZaA puzzle it nevertheless leads to a coherent description of KaZaA and its overlay Furthermore we leverage the measurement results to set forth a number of key principles for the design of a successful unstructured P2P overlay The measurement results and resulting design principles in this paper should be useful for future architects of P2P overlay networks as well as for engineers managing ISPs 1 1 Introduction On a typical day KaZaA has more than 3 million active users sharing over 5 000 terabytes of content On the University of Washington campus network in June 2002 KaZaA consumed approximately 37 of all TCP traffic which was more than twice the Web traffic on the same campus at the same time 8 With over 3 million satisfied users KaZaA is significantly more popular than Napster or Gnutella ever was Sandvine estimates that in the US 76 of P2P file sharing traffic is KaZaA FastTrack traffic and only 8 is Gnutella traffic 23 Clearly both in terms of number of participating users and in traffic volume KaZaA is one of the most important applications ever carried by the Internet In fact it can be argued that KaZaA has been so successful that any new proposal for a P2P file sharing system should be compared with the KaZaA benchmark However largely because KaZaA is a proprietary protocol which encrypts its signalling messages little has been known to date about the specifics of KaZaA s overlay the maintenance of the overlay and the KaZaA signalling protocol In this paper we undertake a comprehensive measurement study of KaZaA s overlay structure and dynamics its neighbor selection its use of dynamic port numbers to circumvent firewalls and its index management Although this study does not fully solve the KaZaA puzzle it nevertheless leads to a coherent description of KaZaA and its overlay while providing many new insights about the details of KaZaA To unravel the mysteries of the KaZaA overlay we developed two measurement apparatus the KaZaA Sniffing Platform and the KaZaA Probing Tool The KaZaA Sniffing Platform is a set of KaZaA nodes that are forced to interconnect in a controlled manner with one another while one node is also connected to hundreds of platform external KaZaA nodes The KaZaA Sniffing Platform collects KaZaA signalling traffic from which we can draw conclusions about the structure and dynamics of the KaZaA overlay The KaZaA Probing Tool establishes a TCP connection with any supplied KaZaA node handshakes with that node and sends and receives arbitrary encrypted KaZaA messages with the node It is used for analyzing node availabilities and KaZaA neighbor selection Both of these apparatus consume limited resources One of the contributions of this paper is to show how it is possible to obtain extensive overlay information of a large scale overlay application with a low cost measurement infrastructure We use these tools to obtain insight into the following questions It is well known that the KaZaA overlay is organized in a two tier hierarchy consisting of Super Nodes SNs in the upper tier and Ordinary Nodes ONs in the lower tier But how many children ONs does a typical SN support What fraction of the peers in KaZaA are SNs Are the SNs densely interconnected or sparsely interconnected 2 How long are ON to SN connections in the overlay How long are SN to SN connections in the overlay What is the typical lifetime of a SN How does an ON discover candidate SNs for parenting Once it has a set of candidate SNs how does it choose a particular parent among them In choosing the parent does it take locality or SN workload into account By allowing peers ONs and SNs to select their own server port numbers KaZaA is more difficult to block with firewalls and NATs How does KaZaA manage the server port numbers What fraction of KaZaA nodes are behind NATs What are the characteristics of the protocol that peers use to establish overlay links among themselves How is the file index relating each file copy to an IP address and port number organized among the SNs In addition to providing novel insights into a remarkably successful P2P system we leverage our measurement results to set forth a number of key principles for the design of an unstructured P2P overlay As we ll discuss in Section 5 these principles including distributed design exploiting heterogeneity load balancing locality connection shuffling and firewall NAT circumvention This paper should not only be of interest to P2P designers but also to engineers at upper and lower tier ISPs who are interested in acquiring a thorough understanding of P2P overlays and traffic Because P2P file sharing systems can generate vast quantities of traffic networking engineers who dimension the network and introduce content distribution devices such as caches need a basic understanding of how major P2P file sharing systems operate Although there has been recent work in analyzing the filesharing workload in KaZaA 8 and 18 to our knowledge we are the first to undertake a comprehensive study of a hierarchical unstructured overlay for a P2P system The paper focuses on the KaZaA overlay network and index management It addresses neither KaZaA s downloading protocol for example KaZaA s parallel downloading and request queuing nor its incentive scheme for encouraging uploading The paper is complementary to 8 and 18 which focus on KaZaA file


View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view liang2005 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view liang2005 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?