DOC PREVIEW
UW-Madison CS 739 - Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Storage management and caching in PAST, a large-scale,persistent peer-to-peer storage utilityAntony RowstronMicrosoft ResearchSt. George House, 1 Guildhall StreetCambridge, CB2 3NH, United [email protected] DruschelRice University6100 Main Street, MS 132Houston, TX 77005-1892, [email protected] pap er presents and evaluates the storage managementand caching in PAST, a large-scale p eer-to-peer persistentstorage utility.PAST is based on a self-organizing, Internet-based overlay network of storage nodes that coop erativelyroute le queries, store multiple replicas of les, and cacheadditional copies of p opular les.In the PAST system, storage no des and les are each as-signed uniformly distributed identiers, and replicas of a leare stored at nodes whose identier matches most closely thele's identier. This statistical assignment of les to storagenodes approximately balances the numb er of les stored oneach node. However, non-uniform storage node capacitiesand le sizes require more explicit storage load balancingto permit graceful behavior under high global storage uti-lization; likewise, non-uniform p opularity of les requirescaching to minimize fetch distance and to balance the queryload.We present and evaluate PAST, with an emphasis on itsstorage management and caching system. Extensive trace-driven experiments show that the system minimizes fetchdistance, that it balances the query load for popular les,and that it displays graceful degradation of p erformance asthe global storage utilization increases beyond 95%.1. INTRODUCTIONPeer-to-peer Internet applications have recently b een pop-ularized through le sharing applications suchasNapster,Gnutella and FreeNet [1, 2, 13]. While much of the atten-tion has b een fo cused on the copyright issues raised by theseparticular applications, p eer-to-p eer systems havemanyin-teresting technical aspects like decentralized control, self-organization, adaptation and scalability. Peer-to-peer sys-tems can be characterized as distributed systems in whichall no des haveidentical capabilities and responsibilities andall communication is symmetric.There are currently many pro jects aimed at constructingpeer-to-p eer applications and understanding more of the is-sues and requirements of such applications and systems [1, 2,8, 13, 15, 20]. We are developing PAST, an Internet-based,peer-to-p eer global storage utility, which aims to providestrong p ersistence, high availability, scalability and security.The PAST system is comp osed of nodes connected to theInternet, where each node is capable of initiating and rout-ing client requests to insert or retrieve les. Optionally,nodes may also contribute storage to the system. The PASTnodes form a self-organizing overlaynetwork. Inserted lesare replicated across multiple no des for availability. Withhigh probability, the set of nodes over which a le is repli-catedisdiverse in terms of geographic lo cation, ownership,administration, network connectivity, rule of law, etc.A storage utilitylikePAST is attractive for several reasons.First, it exploits the multitude and diversity (in geography,ownership, administration, jurisdiction, etc.) of no des in theInternet to achieve strong p ersistence and high availability.This obviates the need for physical transport of storage me-dia to protect backup and archival data; likewise, it obviatesthe need for explicit mirroring to ensure high availabilityand throughput for shared data. A global storage utilityalso facilitates the sharing of storage and bandwidth, thuspermitting a group of no des to jointly store or publish con-tent that would exceed the capacity or bandwidth of anyindividual no de.While PAST oers p ersistent storage services, its semanticsdier from that of a conventional lesystem. Files stored inPAST are associated with a quasi-uniqueleIdthat is gener-ated at the time of the le's insertion into PAST. Therefore,les stored in PAST areimmutablesince a le cannot beinserted multiple times with the same leId. Files can b eshared at the owner's discretion by distributing the leId(potentially anonymously) and, if necessary, a decryptionkey.An eÆcient routing scheme calledPastry[27] ensures thatclient requests are reliably routed to the appropriate no des.Client requests toretrievea le are routed, with high prob-ability, to a no de that is \close in the network" to the clientthat issued the request1, among the live no des that storethe requested le. The number of PAST no des traversed, aswell as the numb er of messages exchanged while routing aclient request, is logarithmic in the total number of PASTnodes in the system under normal op eration.To retrieve a le in PAST, a client must know its leIdand, if necessary, its decryption key. PAST does not pro-vide facilities for searching, directory lo okup, or key distri-bution. Layering such facilities on top of Pastry, the samepeer-to-p eer substrate that PAST is based on, is the sub-ject of current research. Finally, PAST is intended as anarchival storage and content distribution utility and not asa general-purp ose lesystem. It is assumed that users inter-act primarily with a conventional lesystem, whichactsasa lo cal cache for les stored in PAST.In this pap er, we focus on the storage management andcaching in PAST. In Section 2, an overview of the PASTarchitecture is given and we briey describe Pastry,PAST'scontent location and routing scheme. Section 3 describesthe storage management and Section 4 the mechanisms andpolicies for caching in PAST. Results of an exp erimentalevaluation of PAST are presented in Section 5. Relatedwork is discussed in Section 6 and we conclude in Section 7.2. PAST OVERVIEWAny host connected to the Internet can act as a PAST no deby installing the appropriate software. The collection ofPAST nodes forms an overlaynetworkintheInternet. Min-imally,aPAST no de acts as an access point for a user. Op-tionally,aPAST node may also contribute storage to PASTand participate in the routing of requests within the PASTnetwork.The PAST system exp orts the following set of op erations toits clients:fileId = Insert(name, owner-credentials, k, file)stores a le at a user-specied numberkof diverse nodeswithin the PAST network. The op eration pro duces a 160-bitidentier (leId) that can b e used subsequently to identifythe le. The leId is computed as the


View Full Document

UW-Madison CS 739 - Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

Download Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?