DOC PREVIEW
U of I CS 525 - LECTURE NOTES

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 525 Advanced Topics in Distributed SystemsSpring 07Indranil GuptaCharacteristics of P2P systemsMarch 29, 2007Napster SSSPPPPPPPeersnapster.com ServersStore their ownfilesStore peer pointers for all files3. Response1. Query2. All servers search their lists (ternary tree algo.)4. ping candidates5. download from best hostGnutella PPPPPPPWho has PennyLane.mp3?Query’s flooded out, ttl-restricted, forwarded only onceKazaaPPPPPeersSSSupernodesPWhat are the Characteristics of these Systems in Real-life Settings?• Collect traces• Tabulate them• Papers contain plenty of information on how data was collected, the caveats, ifs and buts of the interpretation, etc.– These are important, but we will ignore them for this lecture and concentrate on the raw data and conclusions• We’ll focus only on Gummadi et al and Chu et alMeasurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing WorkloadGummadi, Saroui et alDepartment of Computer ScienceUniversity of Washington2Three-tiered approach1. Analyze 200-day trace of Kazaa traffic Considered only traffic going from U. Washington to the outside2. Develop a model of multimedia workloads Analyze and confirm hypotheses3. Understand locality-awareness in KazaaContributions• Obtained some useful characterizations of Kazaa’s traffic• Showed that Kazaa’s workload is notZipf– Showed that other workloads (multimedia) may not be Zipf either• Presented a model of P2P file-sharing workloads based on their trace results– Validated the model through simulations that yielded results very similar to those from traces• Proved the usefulness of exploiting locality-aware request routingMeasurement results• Users are patient• Users slow down as they age• Kazaa is not one workload• Kazaa clients fetch objects at-most-once• Popularity of objects is often short-lived• Kazaa is not ZipfUser characteristics (1)• Users are patientUser characteristics (2)• Users slow down as they age– clients “die”– older clients ask for less each time they use systemUser characteristics (3)• Client activity– Tracing used could only detect users when their clients transfer data– Thus, they only report statistics on client activity, which is a lower bound on availability– Avg session lengths are typically small (median: 2.4 mins)• Many transactions fail• Periods of inactivity may occur during a request if client cannot find an available server with the object3Object characteristics (1)• Kazaa is notone workload•This does notaccount forconnection overheadObject characteristics (2)• Kazaa object dynamics– Kazaa clients fetch objects at most once– Popularity of objects is often short-lived– Most popular objects tend to be recently born objects– Most requests are for old objects• 72% old – 28% new for large objects• 52% old – 48% new for small objectsObject characteristics (3)• Kazaa is not Zipf• Zipf’s law: popularity of ith-most popular object is proportional to i-α, (α: Zipf coefficient)• Web access patterns are Zipf: small number of objects are extremely popular, but there is a long tail of unpopular requests.• (Zipf) looks linear on log-log scaleCaveat: what is an “object”in Kazaa?Model of P2P file-sharing workloads• On average, a client requests 2 objects/day• P(x): probability that a user requests an object of popularity rank x  Zipf(1)– Adjusted so that objects are requested at most once• A(x): probability that a newly arrived object is inserted at popularity rank x  Zipf(1)• All objects are assumed to have same size• Use caching to observe performance changes (effectiveness  hit rate)Model – Simulation results• File-sharing effectiveness diminishes with client age– System evolves towards one with no locality and objects chosen at random from large space• New object arrivals improve performance– Arrivals replenish supply of popular objects• New clients cannot stabilize performance– Cannot compensate for increasing number of old clients– Overall bandwidth increases in proportion to population sizeModel validation• By tweaking the arrival rate of of new objects, were able to match trace results (with 5475 new arrivals per year)4Exploring locality-awareness• Currently organizations shape or filter P2P traffic• Alternative strategy: exploit locality in file-sharing workload– Caching; or– Use content available within organization to substantially decrease external bandwidth usage– Result: 86% of externally downloaded bytes could be avoided by using an organizational proxySome Questions for You• Most requests for old objects, while most popular objects are new ones – is there a contradiction?• “Unique object” : When do we say two objects A and B are “different”?– When they have different file names• fogonthetyne.mp3 and fogtyne.mp3– When they have exactly same content• 2 mp3 copies of same song, one at 64 kbps and the other at 128 kbps– When A (and not B) is returned by a keyword search, and vice versa– …?• Based on this, does “caching” have a limit? Does caching need to look into file content? Is there a limit to such intelligent caching then?• Should there be separate overlays for small objects and large objects? For new objects and old objects?• Or should there be separate caching strategies?Questions?Availability and Locality Measurements of Peer-to-Peer SystemsChu, Labonte, and LevineDepartment of Computer ScienceUniversity of Massachusetts, AmherstGoals• Measurement study of peer-to-peer (P2P) file sharing applications– Napster (January 2001)– Gnutella (March 2002)• Study collected data to analyze– Locality of files– Distribution of file size and types– AvailabilityExperiment Methodology• Initially discover a large set of users (nodes)• Ping nodes periodically• If node is available, gather list of shared files• Save timestamp of each list• File-ListNew– File-ListOld= File-ListDownloaded5Problem #1: Filenames• Replicas of same exact files have different names. Example:– Smashing Pumpkins - Tonight, Tonight.– Smashing Pumpkins - Tonight– Smashing Pumpkins Tonight Tonight– Tonight Tonight Smashing PumpkinsSolution #1: Filenames• Drop stop words, e.g. “and” and “the”• Take out repeating letters (collins -> colins)• Drop vowels (colins -> clns)• Change non-alphanumeric characters to space• Drop leading white space• Sort the


View Full Document

U of I CS 525 - LECTURE NOTES

Documents in this Course
Epidemics

Epidemics

12 pages

LECTURE

LECTURE

7 pages

LECTURE

LECTURE

39 pages

LECTURE

LECTURE

41 pages

P2P Apps

P2P Apps

49 pages

Lecture

Lecture

48 pages

Epidemics

Epidemics

69 pages

GRIFFIN

GRIFFIN

25 pages

Load more
Download LECTURE NOTES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LECTURE NOTES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LECTURE NOTES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?