DOC PREVIEW
CORNELL CS 614 - Study Notes

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Storage management and caching in PAST, a large-scale,persistent peer-to-peer storage utilityAntony RowstronMicrosoft ResearchSt. George House, 1 Guildhall StreetCambridge, CB2 3NH, United [email protected] DruschelRice University6100 Main Street, MS 132Houston, TX 77005-1892, [email protected] paper presents and evaluates the storage managementand caching in PAST, a large-scale peer-to-peer persistentstorage utility. PAST is based on a self-organizing, Internet-based overlay network of storage nodes that cooperativelyroute file queries, store multiple replicas of files, and cacheadditional copies of popular files.In the PAST system, storage nodes and files are each as-signed uniformly distributed identifiers, and replicas of a fileare stored at nodes whose identifier matches most closely thefile’s identifier. This statistical assignment of files to storagenodes approximately balances the number of files stored oneach node. However, non-uniform storage node capacitiesand file sizes require more explicit storage load balancingto permit graceful behavior under high global storage uti-lization; likewise, non-uniform popularity of files requirescaching to minimize fetch distance and to balance the queryload.We present and evaluate PAST, with an emphasis on itsstorage management and caching system. Extensive trace-driven experiments show that the system minimizes fetchdistance, that it balances the query load for popular files,and that it displays graceful degradation of performance asthe global storage utilization increases beyond 95%.1. INTRODUCTIONPeer-to-peer Internet applications have recently been pop-ularized through file sharing applications such as Napster,Gnutella and FreeNet [1, 2, 13]. While much of the atten-tion has been focused on the copyright issues raised by theseparticular applications, peer-to-peer systems have many in-teresting technical aspects like decentralized control, self-organization, adaptation and scalability. Peer-to-peer sys-tems can be characterized as distributed systems in whichall nodes have identical capabilities and responsibilities andall communication is symmetric.There are currently many projects aimed at constructingPermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for pro£t or commercial advantage and that copiesbear this notice and the full citation on the £rst page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior speci£cpermission and/or a fee.SOSP-18 11/2001 CanadaCopyright 2001 ACM 0-12345-67-8/90/01 ...$5.00.peer-to-peer applications and understanding more of the is-sues and requirements of such applications and systems [1, 2,8, 13, 15, 20]. We are developing PAST, an Internet-based,peer-to-peer global storage utility, which aims to providestrong persistence, high availability, scalability and security.The PAST system is composed of nodes connected to theInternet, where each node is capable of initiating and rout-ing client requests to insert or retrieve files. Optionally,nodes may also contribute storage to the system. The PASTnodes form a self-organizing overlay network. Inserted filesare replicated across multiple nodes for availability. Withhigh probability, the set of nodes over which a file is repli-cated is diverse in terms of geographic location, ownership,administration, network connectivity, rule of law, etc.A storage utility like PAST is attractive for several rea-sons. First, it exploits the multitude and diversity (in ge-ography, ownership, administration, jurisdiction, etc.) ofnodes in the Internet to achieve strong persistence and highavailability. This obviates the need for physical transport ofstorage media to protect backup and archival data; likewise,it obviates the need for explicit mirroring to ensure highavailability and throughput for shared data. A global stor-age utility also facilitates the sharing of storage and band-width, thus permitting a group of nodes to jointly store orpublish content that would exceed the capacity or band-width of any individual node.While PAST offers persistent storage services, its seman-tics differ from that of a conventional filesystem. Files storedin PAST are associated with a quasi-unique fileId that is gen-erated at the time of the file’s insertion into PAST. There-fore, files stored in PAST are immutable since a file cannotbe inserted multiple times with the same fileId. Files canbe shared at the owner’s discretion by distributing the fileId(potentially anonymously) and, if necessary, a decryptionkey.An efficient routing scheme called Pastry [27] ensures thatclient requests are reliably routed to the appropriate nodes.Client requests to retrieve a file are routed, with high prob-ability, to a node that is “close in the network” to the clientthat issued the request1, among the live nodes that storethe requested file. The number of PAST nodes traversed, aswell as the number of messages exchanged while routing aclient request, is logarithmic in the total number of PASTnodes in the system under normal operation.1Network proximity is based on a scalar metric such as thenumber of IP routing hops, bandwidth, geographic distance,etc.To retrieve a file in PAST, a client must know its fileIdand, if necessary, its decryption key. PAST does not pro-vide facilities for searching, directory lookup, or key distri-bution. Layering such facilities on top of Pastry, the samepeer-to-peer substrate that PAST is based on, is the sub-ject of current research. Finally, PAST is intended as anarchival storage and content distribution utility and not asa general-purpose filesystem. It is assumed that users inter-act primarily with a conventional filesystem, which acts asa local cache for files stored in PAST.In this paper, we focus on the storage management andcaching in PAST. In Section 2, an overview of the PASTarchitecture is given and we briefly describe Pastry, PAST’scontent location and routing scheme. Section 3 describesthe storage management and Section 4 the mechanisms andpolicies for caching in PAST. Results of an experimentalevaluation of PAST are presented in Section 5. Relatedwork is discussed in Section 6 and we conclude in Section 7.2. PAST OVERVIEWAny host connected to the Internet can act as a PASTnode by installing the appropriate software.


View Full Document

CORNELL CS 614 - Study Notes

Documents in this Course
Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?