Berkeley COMPSCI 186 - Peer-to-Peer (p2p) Querying - D2924297

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 186> Peer-to-Peer (p2p) Querying

DOC PREVIEW

Berkeley COMPSCI 186 - Peer-to-Peer (p2p) Querying

School name University of California, Berkeley

Course Compsci 186- Introduction to Database Systems

Pages 11

This preview shows page 1-2-3-4 out of 11 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Peer-to-Peer (p2p) QueryingJoe HellersteinCS186 Fall 2005Note• These slides are based on a tutorial given atVLDB 2004– http://db.cs.berkeley.edu/jmh/talks/vldb04-p2ptut-final.ppt– http://db.cs.berkeley.edu/jmh/talks/vldb04-p2ptut-2upbw.pdf• These slides were made on a Mac– May not display correctly in PPT for Windows• Animation is often a portability problem for PPT• PPT’s Compatibility Check finds 185 issues!Outline• What is p2p?• Querying in early p2p systems– Napster– Gnutella– KaZaA, Gnutella with Ultrapeers• Some problems with queries in Gnutella• Distributed Hash Tables (DHTs)– Chord– Keyword search over DHTs• More fun– Towards full-service p2p querying– Get involved!– DB ideas infecting the network more deeplyp2p• Distributed applications without servers– Scale– Peers– Churn– Self-admin• People tend to think of “filestealing”– Respect the musicians, don’t steal music.– Also used for, e.g., swapping biological data, open-sourcesoftware, etc.– Lots of potential applications of the technology• Go make some up!• My favorite: “Public Health for the Internet”– P2P is an inherently democratic architecturep2p, cont• p2p is “organic”– Start the next phenomenon in your dorm room– No need for a hosted server, administrator, etc.– Hence no need for Venture Capital– Hence no need to worry if it will take off• Infrastructure “right-sizes” itselfOutline• What is p2p?• Querying in early p2p systems– Napster– Gnutella– KaZaA, Gnutella with Ultrapeers• Some problems with queries in Gnutella• Distributed Hash Tables (DHTs)– Chord– Keyword search over DHTs• More fun– Towards full-service p2p querying– Get involved!– DB ideas infecting the network more deeply2Early P2P I: Client-Server• Napsterxyz.mp3 ?xyz.mp3Early P2P I: Client-Server• Napster– Client-Server searchxyz.mp3Early P2P I: Client-Server• Napster– Client-Server searchxyz.mp3 ?xyz.mp3Early P2P I: Client-Server• Napster– Client-Server search– “pt2pt” file xferxyz.mp3 ?xyz.mp3Early P2P I: Client-Server• Napster– Client-Server search– “pt2pt” file xferxyz.mp3 ?xyz.mp3Early P2P II: Flooding on Overlaysxyz.mp3 ?xyz.mp3An overlay network. “Unstructured”.3Early P2P II: Flooding on Overlaysxyz.mp3 ?xyz.mp3FloodingEarly P2P II: Flooding on Overlaysxyz.mp3 ?xyz.mp3FloodingEarly P2P II: Flooding on Overlaysxyz.mp3Early P2P II.v: “Ultrapeers”• Ultrapeers can be installed (KaZaA) or self-promoted (Gnutella)Gnutella Network• Popular open-source file-sharingnetwork– ~450,000 users as of 2003– ~2,000,000 today• Ultrapeer-based Topology– Queries flooded among ultrapeers– Leaf nodes shielded from querytraffic– Based on multiple crawlers from 30vantage points on PlanetLabUltrapeer nodesLeaf nodes>100 Files0-100 Files0 FilesOct 2003 CrawlPlanetLab• PlanetLab: Open, globally distributed platform fordeploying planetary-scale network services• 631 nodes at 299 sites, 5 continents• URL: http://www.planet-lab.org4Gnutella Measurements• “Quality” of Searches:– Recall (% of all relevant items retrieved)– Distinct Recall (% of all relevant distinct itemsretrieved)– Response Time (Latency) to 1st result• Software utilized:– Modified the LimeWire Gnutella Client• Run as leaf or ultrapeer• Monitor Gnutella traffic• Inject queries and gather resultsGnutella Search Quality• Log of Gnutella queries from LimeWire clients• Reissued Gnutella queries– 700 randomly chosen queries– 30 LimeWire Ultrapeers on PlanetLab– 3 different times• Computed Query “Recall”– Each query issued simultaneously from 30 ultrapeers– Union of results from 30 ultrapeers• “Union-of-30” is our approximation of perfect answerQueries with Small Result SetsSmall result set → items have few copies in networkResult Size CDFSingle Query“Union-of-30”QueryLarge fraction of queries return few orno results even when they existQuery LatencyMany resultsQueries that return few results havepoor response timesFew ResultsSummary of Measurements• Searching on Flood-based networks– Highly effective for popular (highly replicated) items– Less effective for rare items– Significant opportunity to do better:• Large fraction of queries return few or no results even when theyexist• Bad response times for queries on rare items• Aggressive flooding is not the solution– Diminishing returns with flooding– Does not improve response times5Outline• What is p2p?• Querying in early p2p systems– Napster– Gnutella– KaZaA, Gnutella with Ultrapeers• Some problems with queries in Gnutella• Distributed Hash Tables (DHTs)– Chord– Keyword search over DHTs• More fun– Towards full-service p2p querying– Get involved!– DB ideas infecting the network more deeplyHigh-Level Idea: Indirection• Indirection in space– Logical (content-based) IDs, routing to those IDs• “Content-addressable” network– Tolerant of churn• nodes joining and leaving the network• Indirection in time– Want some scheme to temporally decouple send and receive– Persistence required. Typical Internet solution: soft state• Combo of persistence via storage and via retry– “Publisher” requests TTL on storage– Republishes as needed• Metaphor: Distributed Hash Tableto hzh=zWhat is a DHT?• Hash Table– data structure that maps “keys” to “values”– essential building block in software systems• Distributed Hash Table (DHT)– similar, but spread across the Internet• Interface– insert(key, value)– lookup(key)How?Every DHT node supports a single operation:– Given key as input; route messages towardnode holding keyK VK VK VK VK VK VK VK VK VK VK VDHT in actionK VK VK VK VK VK VK VK VK VK VK VDHT in action6K VK VK VK VK VK VK VK VK VK VK VDHT in actionOperation: take key as input; route messages to node holding keyK VK VK VK VK VK VK VK VK VK VK VDHT in action: put()insert(K1,V1)Operation: take key as input; route messages to node holding keyK VK VK VK VK VK VK VK VK VK VK VDHT in action: put()Operation: take key as input; route messages to node holding keyinsert(K1,V1)(K1,V1)K VK VK VK VK VK VK VK VK VK VK VDHT in action: put()Operation: take key as input; route messages to node holding keyretrieve (K1)K VK VK VK VK VK VK VK VK VK VK VDHT in action: get()Operation: take key as

View Full Document