LOOKING UP DATA IN P2P SYSTEMS

Home> Academic Documents> LOOKING UP DATA IN P2P SYSTEMS

DOC PREVIEW

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

LOOKING UP DATA IN P2P SYSTEMSHari Balakrishnan, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica∗MIT Laboratory for Computer Science1. IntroductionThe recent success of some widely deployed peer-to-peer (P2P)file sharing applications has sparked new research in this area. Weare interested in the P2P systems that have no centralized controlor hierarchical organization, where the software running at eachnode is equivalent in functionality. Because these completely de-centralized systems have the potential to significantly change theway large-scale distributed systems are built in the future, it seemstimely to review some of this recent research.The main challenge in P2P computing is to design and imple-ment a robust distributed system composed of inexpensive com-puters in unrelated administrative domains. The participants in atypical P2P system might be home computers with cable modemor DSL links to the Internet, as well as computers in enterprises.Some current P2P systems have reported tens of thousands of si-multaneously active participants, with half a million participatingmachines over a week-long period.P2P systems are popular and interesting for a variety of reasons:1. The barriers to starting and growing such systems are low,since they usually don’t require any special administrative orfinancial arrangements, unlike with centralized facilities.2. P2P systems suggest a way to aggregate and make use ofthe tremendous computation and storage resources that oth-erwise just sit idle on computers across the Internet whenthey are not in use.3. The decentralized and distributed nature of P2P systemsgives them the potential to be robust to faults or intentionalattacks, making them ideal for long-term storage as well asfor lengthy computations.P2P computing raises many interesting research problems in dis-tributed systems. In this short paper we will look at one of them,the lookup problem: How do you find any given data item in a largeP2P system in a scalable manner, without any centralized serversor hierarchy? This problem is at the heart of any P2P system. It isnot addressed well by most systems in popular use, and it provides∗University of California, Berkeley.a good example of how the challenges of designing P2P systemscan be addressed.The recent algorithms developed by several research groups forthe lookup problem present a simple and general interface, a dis-tributed hash table (DHT). Data items are inserted in a DHT andfound by specifying a unique key for that data. To implement aDHT, the underlying algorithm must be able to determine whichnode is responsible for storing the data associated with any givenkey. To solve this problem, each node maintains information (e.g.,the IP address) of a small number of other nodes (“neighbors”) inthe system, forming an overlay network and routing messages inthe overlay to store and retrieve keys.One might believe from recent news items that P2P systems aregood for illegal music swapping and little else, but this would bea rather hasty conclusion. The distributed hash table, for exam-ple, is increasingly finding uses in the design of robust, large-scaledistributed applications. It appears to provide a general-purposeinterface for location-independent naming upon which a variety ofapplications can be built. Furthermore, distributed applications thatmake use of such an infrastructure inherit robustness, ease of op-eration, and scaling properties. A significant amount of researcheffort is now being devoted to investigating these ideas.2. The lookup problemThe lookup problem is simple to state. Some publisher insertsan item X, say a file, in the system. At a later point in time, someconsumer wants to retrieve X, in general when the publisher isn’tonline and connected to the system. How does the consumer findthe location of a server that has a replica of X?One approach is to maintain a central database that maps afile name to the locations of servers that store the song. Napster(http://www.napster.com/) adopts this approach for songtitles, but it has inherent reliability and scalability problems thatmake it vulnerable to attacks on the database.Another approach, at the other end of the spectrum, is for theconsumer to broadcast a message to all its neighbors with a requestfor X. When a node receives such a request, it checks its localdatabase. If it has X, it responds with the item. Otherwise, it for-wards the request to its neighbors, which execute the same protocol.Gnutella (http://gnutella.wego.com/) has a protocol inthis style with some mechanisms to avoid request loops. However,this “broadcast” approach doesn’t scale either [7], because of thebandwidth consumed by broadcast messages and the compute cy-cles consumed by the many nodes that must handle these messages.In fact, the day after Napster was shut down, reports indicate thatthe Gnutella network collapsed under its own load, created when alarge number of users migrated to using it for sharing music.To reduce the cost of broadcast messages, one can organize thenodes in the network into a hierarchy, like the Internet’s DomainName System (DNS) does. Searches start at the top of the hierarchyand, by following forwarding references from node to node, tra-verse a single path down to the node that contains the desired data.Directed traversal of a single path consumes fewer resources thana broadcast. Many of the current popular systems, such KaZaA,Grokster, and MusicCity Morpheus, which are all based on Fast-Track’s P2P platform (http://www.fasttrack.nu/), adoptthis approach. The disadvantage of the hierarchical approach is thatthe nodes higher in the tree take a larger fraction of the load thanthe leaf nodes, and therefore require more expensive hardware andmore careful management. The failure or removal of the tree rootor a node sufficiently high in the hierarchy can be catastrophic.Symmetric distributed lookup algorithms avoid the drawbacks ofthe previous approaches. Searches are carried out by following ref-erences from node to node until the appropriate node containingthe data is found. Unlike the hierarchy, no node plays a specialrole—a search can start at any node, and each node is involved inonly a small fraction of the search paths in the system. As a re-sult, no node consumes an excessive amount of resources whilesupporting searches. These new algorithms are designed to scalewell—they require each node to only maintain information abouta small


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 5 pages.

Please select your school