Unformatted text preview:

Storage management and caching in PAST a large scale persistent peer to peer storage utility Antony Rowstron Peter Druschel Microsoft Research St George House 1 Guildhall Street Cambridge CB2 3NH United Kingdom Rice University 6100 Main Street MS 132 Houston TX 77005 1892 USA antr microsoft com druschel cs rice edu ABSTRACT This paper presents and evaluates the storage management and caching in PAST a large scale peer to peer persistent storage utility PAST is based on a self organizing Internetbased overlay network of storage nodes that cooperatively route le queries store multiple replicas of les and cache additional copies of popular les In the PAST system storage nodes and les are each assigned uniformly distributed identi ers and replicas of a le are stored at nodes whose identi er matches most closely the le s identi er This statistical assignment of les to storage nodes approximately balances the number of les stored on each node However non uniform storage node capacities and le sizes require more explicit storage load balancing to permit graceful behavior under high global storage utilization likewise non uniform popularity of les requires caching to minimize fetch distance and to balance the query load We present and evaluate PAST with an emphasis on its storage management and caching system Extensive tracedriven experiments show that the system minimizes fetch distance that it balances the query load for popular les and that it displays graceful degradation of performance as the global storage utilization increases beyond 95 1 INTRODUCTION Peer to peer Internet applications have recently been popularized through le sharing applications such as Napster Gnutella and FreeNet 1 2 13 While much of the attention has been focused on the copyright issues raised by these particular applications peer to peer systems have many interesting technical aspects like decentralized control selforganization adaptation and scalability Peer to peer systems can be characterized as distributed systems in which all nodes have identical capabilities and responsibilities and all communication is symmetric There are currently many projects aimed at constructing peer to peer applications and understanding more of the issues and requirements of such applications and systems 1 2 8 13 15 20 We are developing PAST an Internet based peer to peer global storage utility which aims to provide strong persistence high availability scalability and security The PAST system is composed of nodes connected to the Internet where each node is capable of initiating and routing client requests to insert or retrieve les Optionally nodes may also contribute storage to the system The PAST nodes form a self organizing overlay network Inserted les are replicated across multiple nodes for availability With high probability the set of nodes over which a le is replicated is diverse in terms of geographic location ownership administration network connectivity rule of law etc A storage utility like PAST is attractive for several reasons First it exploits the multitude and diversity in geography ownership administration jurisdiction etc of nodes in the Internet to achieve strong persistence and high availability This obviates the need for physical transport of storage media to protect backup and archival data likewise it obviates the need for explicit mirroring to ensure high availability and throughput for shared data A global storage utility also facilitates the sharing of storage and bandwidth thus permitting a group of nodes to jointly store or publish content that would exceed the capacity or bandwidth of any individual node While PAST o ers persistent storage services its semantics di er from that of a conventional lesystem Files stored in PAST are associated with a quasi unique leId that is generated at the time of the le s insertion into PAST Therefore les stored in PAST are immutable since a le cannot be inserted multiple times with the same leId Files can be shared at the owner s discretion by distributing the leId potentially anonymously and if necessary a decryption key An e cient routing scheme called Pastry 27 ensures that client requests are reliably routed to the appropriate nodes Client requests to retrieve a le are routed with high probability to a node that is close in the network to the client that issued the request1 among the live nodes that store the requested le The number of PAST nodes traversed as well as the number of messages exchanged while routing a client request is logarithmic in the total number of PAST nodes in the system under normal operation To retrieve a le in PAST a client must know its leId and if necessary its decryption key PAST does not provide facilities for searching directory lookup or key distribution Layering such facilities on top of Pastry the same peer to peer substrate that PAST is based on is the subject of current research Finally PAST is intended as an archival storage and content distribution utility and not as a general purpose lesystem It is assumed that users interact primarily with a conventional lesystem which acts as a local cache for les stored in PAST In this paper we focus on the storage management and caching in PAST In Section 2 an overview of the PAST architecture is given and we brie y describe Pastry PAST s content location and routing scheme Section 3 describes the storage management and Section 4 the mechanisms and policies for caching in PAST Results of an experimental evaluation of PAST are presented in Section 5 Related work is discussed in Section 6 and we conclude in Section 7 2 PAST OVERVIEW Any host connected to the Internet can act as a PAST node by installing the appropriate software The collection of PAST nodes forms an overlay network in the Internet Minimally a PAST node acts as an access point for a user Optionally a PAST node may also contribute storage to PAST and participate in the routing of requests within the PAST network The PAST system exports the following set of operations to its clients fileId Insert name owner credentials k file stores a le at a user speci ed number k of diverse nodes within the PAST network The operation produces a 160 bit identi er leId that can be used subsequently to identify the le The leId is computed as the secure hash SHA 1 of the le s name the owner s public key and a randomly chosen salt This choice ensures with very high probability that leIds are unique Rare leId collisions are detected and lead to


View Full Document

UW-Madison CS 739 - Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

Loading Unlocking...
Login

Join to view Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?