DOC PREVIEW
Design Tradeoffs in Applying Content Addressable Storage to Enterprise-scale Systems

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Design Tradeoffs in Applying Content Addressable Storageto Enterprise-scale Systems Based on Virtual MachinesPartho Nath†, Michael A. Kozuch?, David R. O’Hallaron‡, Jan Harkes‡,M. Satyanarayanan‡, Niraj Tolia‡, and Matt Toups‡†Penn State University,?Intel Research Pittsburgh, and‡Carnegie Mellon UniversityAbstractThis paper analyzes the usage data from a live deploy-ment of an enterprise client management system basedon virtual machine (VM) technology. Over a period ofseven months, twenty-three volunteers used VM-basedcomputing environments hosted by the system and cre-ated over 800 checkpoints of VM state, where eachcheckpoint included the virtual memory and disk states.Using this data, we study the design tradeoffs in apply-ing content addressable storage (CAS) to such VM-basedsystems. In particular, we explore the impact on storagerequirements and network load of different privacy prop-erties and data granularities in the design of the under-lying CAS system. The study clearly demonstrates thatrelaxing privacy can reduce the resource requirements ofthe system, and identifies designs that provide reasonablecompromises between privacy and resource demands.1 IntroductionThe systems literature of recent years bears witness to asignificantly increased interest in virtual machine (VM)technology. Two aspects of this technology, namely plat-form independence and natural state encapsulation, haveenabled the application of this technology in systems de-signed to improve scalability [6, 14, 16, 32, 40, 49], se-curity [15, 21, 47], reliability [1, 4, 8, 25, 44], and clientmanagement [7, 5, 20].The benefits derived from platform independence andstate encapsulation, however, often come with an asso-ciated cost, namely the management of significant datavolume. For example, enterprise client management sys-tems [7, 20] may require the storage of tens of gigabytesof data per user. For each user, these systems store animage of the user’s entire VM state, which includes notonly the state of the virtual processor and platform de-vices, but the memory and disk states as well.While this cost is initially daunting, we would expecta collection of VM state images to have significant dataredundancy because many of the users will employ thesame operating systems and applications. Content ad-dressable storage (CAS) [3, 27, 30, 36, 44, 48] is anemerging mechanism that can reduce the costs associatedwith this volume of data by eliminating such redundancy.Essentially, CAS uses cryptographic hashing techniquesto identify data by its content rather than by name. Con-sequently, a CAS-based system will identify sets of iden-tical objects and only store or transmit a single copy evenif higher-level logic maintains multiple copies with dif-ferent names.To date, however, the benefit of CAS in the contextof enterprise-scale systems based on VMs has not beenquantified. In this paper, we analyze data obtained froma seven-month, multi-user pilot deployment of a VM-based enterprise client management system called Inter-net Suspend/Resume (ISR) [19, 37]. Our analysis aimsto answer two basic questions:Q1: By how much can the application of CAS reducethe system’s storage requirements?Q2: By how much can the application of CAS reducethe system’s network traffic?The performance of CAS depends upon several systemparameters. The answers to Q1 and Q2, therefore, are an-alyzed in the context of the two most important of thesedesign criteria:C1: The privacy policy, andC2: the object granularity.The storage efficiency of a CAS system, or the extentto which redundant data is eliminated, depends upon thedegree to which that system is able to identify redundantdata. Hence, the highest storage efficiency requires usersto expose cryptographic digests to the system and po-tentially to other users. As we shall see, the effects ofthis exposure can be reduced but not eliminated. Conse-quently, criterion C1 represents a trade-off between stor-age efficiency and privacy.Object granularity, in contrast, is a parameter that dic-tates how finely the managed data is subdivided. BecauseCAS systems exploit redundancyat the object level, largeobjects (like disk images) are often represented as a se-quence of smaller objects. For example, a multi-gigabytedisk image may be represented as a sequence of 128 KBobjects (or chunks). A finer granularity (smaller chunk-size) will often expose more redundancy than a coarsergranularity. However, finer granularities will also requiremore meta-data to track the correspondingly larger num-ber of objects. Hence, criterion C2 represents the trade-off between efficiency and meta-data overhead.The results obtained from the ISR pilot deploymentindicate that the application of CAS to VM-based man-agement systems is more effective in reducing storageand network resource demands than applying traditionalcompression technology such as the Lempel-Ziv com-pression [50] used in gzip. This result is especially sig-nificant given the non-zero runtime costs of compressingand uncompressing data. In addition, combining CASand traditional compression reduces the storage and net-work resource demands by a factor of two beyond the re-ductions obtained by using traditional compression tech-nology alone.Further, using this real-world data, we are able to de-termine that enforcing a strict privacy policy requires ap-proximately 1.5 times the storage resources required bya system with a less strict privacy policy. Finally, wehave determined that the efficiency improvements de-rived from finer object granularity typically outweighsthe meta-data overhead. Consequently, the disk imagechunksize should be between 4 and 16 KB.Sections 4 and 5 will elaborate on these results fromthe pilot deployment. But first, we provide some back-ground on ISR, content addressable storage, and themethodology used in the study.2 Background2.1 Internet Suspend/ResumeInternet Suspend/Resume (ISR) is an enterprise clientmanagement system that allows users to access their per-sonal computing environments from different physicalmachines. The system is based on a combination ofVM technology and distributed storage. User comput-ing environments are encapsulated by VM instances, andthe state of such a VM instance, when idle, is capturedby system software and stored on a carefully-managedserver. There are a couple of motivations for this idea.First, decoupling the computing environment from thehardware allows clients to migrate across


Design Tradeoffs in Applying Content Addressable Storage to Enterprise-scale Systems

Download Design Tradeoffs in Applying Content Addressable Storage to Enterprise-scale Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Design Tradeoffs in Applying Content Addressable Storage to Enterprise-scale Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Design Tradeoffs in Applying Content Addressable Storage to Enterprise-scale Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?