Measurements of Peer-to-Peer SystemsIntroduction to Peer-to-Peer (P2P) systemsClassification of P2P systemsPopularity of unstructured decentralized P2P networksOutlineGnutella protocol overviewCharacterization of Users of P2P systemsMeasurement MethodologyHost Lifetime analysisLatency analysis (Gnutella)Bottleneck Bandwidth Analysis (Gnutella)Downloads, Uploads and Shared FilesShared files v/s Shared Data (Napster and Gnutella)Degree of Cooperation (Napster)Effect of P2P traffic on underlying networkSlide 16Datasets used for analysisHost distribution analysisHost connectivity analysis (FastTrack)Traffic volume analysisMean bandwidth usage (FastTrack and Direct Connect)Traffic patterns over time (FastTrack)Connection duration and On-time (FastTrack)Peer-to-Peer TopologiesGnutella Network GrowthDistribution of node-to-node shortest pathsAverag node connectivityNode connectivity distributionSearching on the P2P networkTop 20 most popular query typesQuery popularity distributionDeciphering proprietary P2P systemsCharacteristics of Collected TracesFile download distribution by bytesFile size distributionQuantity and Rate of Distinct FilesRate of change of popularity of filesOpen QuestionsReferencesMeasurements ofPeer-to-Peer SystemsPradnya KarbhariNov 25th, 2003CS 8803: Network Measurements SeminarIntroduction to Peer-to-Peer (P2P) systemsEnd-systems (or peers), are capable of behaving as clients and servers of data, hence system is scalable and reliablePeers participation is voluntary, membership is dynamic, hence topology keeps changingMost popularly used for file sharing, hence peer-to-peer systems have become synonymous with peer-to-peer file sharing networksClassification of P2P systemsP2P computation (e.g. seti@home)P2P communication (instant messaging)P2P file-sharing networksCentralized (e.g. Napster)DecentralizedStructured (e.g. Chord, CAN, Pastry, Tapestry)Unstructured (e.g. Gnutella, Kazaa, Freenet, eDonkey, eMule, Direct Connect, …)Popularity of unstructured decentralized P2P networksGnutella host count, maintained by Limewire (http://www.limewire.com)good scope for measurement studies because:deployed and widely useduse a lot of bandwidth during data transfer, hence a concern for network operatorsquite a few measurement studies have been done on these systems, some of which we will discuss in this seminarOutlineCharacterization of users of P2P systemsSaroiu, et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002.Effect of P2P traffic on the underlying networkSen, et.al., “Analyzing peer-to-peer traffic across large networks”, IMW’02Peer-to-Peer TopologiesRipeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design”, IEEE Internet Computing, 2002.Searching on the P2P networkSripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001Deciphering proprietary P2P systems (like Kazaa)Leibowitz, et.al., “Deconstructing the Kazaa Network”, WIAPP, 2003.Gnutella protocol overviewConnecting to the Gnutella networkbootstrap using GWebCache system and locally cached hostlistPing/Pong messages are exchanged with potential neighborsSearching on the networkQuery messages are flooded on the networkQueryHit messages are received (back-propagated along Query path) from peers having the requested contentDownloading the contentpeers download files directly from peers having the requested contentCharacterization of Users of P2P systemsS. Saroiu, P. Gummadi and S. Gribble, “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN’02.first paper to characterize p2p file sharing systemsGoal: To analyze the following user characteristicslatencylifetime of peersbottleneck bandwidthnumber of files shared and downloadeddegree of cooperationmethodology: active crawlingsystems studied: Napster and Gnutelladata collection: May 2001Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002Measurement Methodologyactive crawling of the Napster and Gnutella systemsNapster: issued queries for popular content, and then queried central server for peer informationGnutella: used ping/pong messages in protocol to get metadata about peers, and then their neighbors and so onparallel measurement for:peer lifetime- periodic probing of peers obtained from crawlersoffline if no response to TCP SYNinactive if response to TCP SYN is a TCP RSTactive if accepts the incoming TCP connection on that portlatency- RTT measurements from one hostbottleneck link bandwidth- active probing using Sprobe, a tool they developed based on packet-pair dispersion techniqueSaroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002Host Lifetime analysis20% peers in Napster, Gnutella have IP-level uptime of 93% or moreNapster peers have higher application uptimes than Gnutella peersthe best 20% of Napster peers have uptime of 83% or more and the best 20% of Gnutella peers have uptime of 45% or moremedian session duration is 60 minutes for Napster and GnutellaSaroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002Latency analysis (Gnutella)20% peers have a latency of at most 70ms and 20% have a latency of at least 280mscorrelation between downstream bottleneck bandwidth and latency: two clusters for modems (20-60Kbps, 100-1000ms) and broadband (1Mbps, 60-300ms)Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002Bottleneck Bandwidth Analysis (Gnutella)92% Gnutella peers have downstream bottleneck bandwidth of at least 100Kbps22% peers have upstream bottleneck bandwidth of 100Kbps or lesspeers are unsuitable to serve contentSaroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002Downloads, Uploads and Shared Filesrelative number of downloads and uploads varies significantly across bandwidth classesclear client/server behavior of different classesSaroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002Shared files v/s Shared Data(Napster and Gnutella)Strong correlation between number of files shared and amount of shared MB of dataslope of both lines is 3.7MB, the size of a typical MP3 audio
View Full Document