DOC PREVIEW
Two-Level, Self-Verifying Data for Peer-to-Peer Storage

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Two-Level, Self-Verifying Data for Peer-to-Peer StoragePatrick Eaton, Hakim Weatherspoon, and John KubiatowiczUniversity of California, BerkeleyReport No. UCB/CSD-05-1401June 2005Computer Science Division (EECS)University of CaliforniaBerkeley, California 94720Two-Level, Self-Verifying Data for Peer-to-Peer StoragePatrick Eaton, Hakim Weatherspoon, and John KubiatowiczUniversity of California, BerkeleyJune 2005AbstractFirst-generation peer-to-peer storage systems unneces-sarily couple the unit of client data access to the unit ofinfrastructure data management. Designs that require allpeers to operate on data at a fixed granularity lead to inef-ficiencies such as high query load and high per-blockstor-age overheads. To provide variable granularity accessand support more efficient peer-to-peer storage systems,we introduce two-level naming of self-verifying data. Wedescribe how to implement two-level naming and advo-cate an extension to the traditional API used by peer-to-peer storage systems to support two-level naming.1 IntroductionSelf-verifying data is a fundamental building block ofpeer-to-peer storage systems. It allows clients to vali-date the integrity of data and the infrastructure to iden-tify corrupt data. Because many peer-to-peer applica-tions are targeted to run on mutually distrusting machinesspread across the wide-area, the ability to verify data isvital. Self-verifying data guards against errors introducedby faulty peers or transmission through public networks.It also protects clients from malicious or compromisedservers that attempt to deceive users by returning modi-fied data.Examining a variety of first-generation peer-to-peerstorage systems, we observed that they share a number ofdesign decisions with respect to self-verifying data. Forexample, most clients create and access many small datablocks that are linked into larger data structures. Aftercreating data, clients store each block in the storage in-frastructure as an independentobjects. The storage infras-tructure then manages, indexes, tracks, and repairs eachsmall block individually.Such a design effectively couples the infrastructure’sunit of data management to the client’s unit of data cre-ation and access. This coupling engenders a challenge forcreating efficient systems. For clients at the edges of thenetwork, it is natural to work with data divided into smallblocks. However, a storage infrastructure that managessmall blocks sees higher indexing cost (because the in-frastructure must track all replicas of each smaller blockindividually)anda largerquery load (because clients mustuse the infrastructure to locate each small block). On theother hand, a storage infrastructure can reduce the over-head of tracking data and resolving queries by amortizingthe costs over larger blocks of data. However, requiringclients to work with large blocks can waste precious band-width at the edges of the network and require significantdata buffering that limits data durability. Most existingsystems have elected to use relatively small data blocks,resulting in systems with high query load on the infras-tructure, high storage overhead, and poor communicationpatterns.To improve the efficiency of peer-to-peer storage sys-tems, we introduce two-level naming for self-verifyingdata. Two-level naming decouples the unit of manage-ment from the unit of access by packing many small,application-level data blocks into larger containers calledextents while maintaining the self-verifying properties.This solution allows different components to access dataat different granularities—clients can access exactly thedata they desire by referencing data at a fine granularitywhile peers in the infrastructure can operate on larger con-tainers to amortize the cost of indexing and querying data.To support two-level naming, we advocate an extension to1the traditional API used by peer-to-peer storage systems.In Section 2, we review the concepts of self-verifyingdata and other related work. In Section 3, we detail theprevailing design decisions of popular, first-generationpeer-to-peer storage systems and describe the conse-quences of these designs. In Section 4, we present two-level naming and describe an implementation based onextending the API used by traditional peer-to-peer stor-age systems. Section 5 shows how to use that API to builda versioning backup application. Finally, Section 6 con-cludes.2 Background and Related WorkData is said to be self-verifyingif it is named in a way thatallows any peer to validate the integrity of data against thename by which it was retrieved. The self-verifying prop-erty enables clients to request data from any peer in thenetwork without concern of data corruption or substitu-tion attack. A malicious peer cannot deceive a client withcorrupt data—its attack is limited to denying a block’s ex-istence.Data can be made self-verifying via two techniques:hashing and embedded signatures. Hash-verified data isnamed by a secure hash of its content. A client can verifyhash-verified data by recomputing the hash of the data.Key-verified data is named by the hash of a public keythat verifies a signature over a secure hash of the data.To verify the data, a client uses the public key that corre-sponds to the data’s name to verify the signature over thedata, recomputes the hash of the data, and compares thecomputed hash with the signed hash. These techniqueswere made popular by the Self-certifying Read-only FileSystem [4].Many systems employ Merkle’s chaining tech-nique [10] with hash-verified data to combine blocks intolarger, self-verifyingdata structures. Such systems embedself-verifying names into other data blocks as secure, un-forgeable pointers. To bootstrap the process, systems of-ten embed secure pointers in key-verified blocks, provid-ing an immutable name for mutable data. To update data,then, a client replaces a key-verified block with a blockthat references the new data. See, for example, CFS [2],Ivy [12], and Venti [13].Additionally, Weatherspoon et al. extended the hash-based approach to name erasure code fragments in a self-verifyingmanner [17]. Clients can verify either individualerasure code fragments or the full block of data by thesame name. Distillation codes [7] can be considered ageneralization of this scheme.The classical file systems literature provides an inspir-ing precedent of improving system efficiency by adaptingthe granularity of data access. Many file systems man-age data at


Two-Level, Self-Verifying Data for Peer-to-Peer Storage

Download Two-Level, Self-Verifying Data for Peer-to-Peer Storage
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Two-Level, Self-Verifying Data for Peer-to-Peer Storage and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Two-Level, Self-Verifying Data for Peer-to-Peer Storage 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?