Unformatted text preview:

Structures for Efficient File System-Scale Partial PersistenceDan R. K. Ports and Austin T. Clements{drkp,amdragon}@mit.eduMay 12, 2005AbstractA persistent file system stores every previousstate of each file, allowing convenient access tothe full state of the file system as it appearedat any point in the past. Achieving this conve-nient feature presents a challenging data struc-tural problem because the amount of data in-volved is so large: it must use as little spaceas possible, and provide efficient operations formodifying the current state and accessing bothcurrent and past states. We formalize persistentfile systems as a problem in data structures, andanalyze it in the context of the external memorymodel. We begin by considering the design of ourinitial solution to this problem from the PersiFS1file system, which is based on a log of metadatachanges and an indirection layer for storing filedata. These “systems” data structures supportthe desired operations, but are not asymptoticallyefficient. Applying more advanced data struc-tures, we refine the design into the next version,PersiFS2. We use B+-trees for file content in-dexing in order to improve the space efficiencyof the system, and we present a novel partially-persistent B+-tree design, which can be used totrack changes to files with logarithmic modifica-tion and query cost. PersiFS2has been imple-mented as a working file system with these datastructures, and our measurements indicate thatthe new file system data structure provides dra-matically improved access time for previous revi-sions with a small increase in cost for modifica-tions.1 OverviewThe data-structural notion of persistence — theability to access previous states — is powerfulin many situations. We consider persistence inthe context of file systems, where it provides ac-cess to previous versions of data that may havebeen modified or deleted. This capability hasgreat practical utility, but finding an appropri-ate representation for the state of the file systempresents several challenges. The volume of datainvolved necessitates a compact data structurethat supports efficient queries and modifications.In order to address these challenges, wepresent a novel set of data structures optimizedfor the requirements of a persistent file system:applying modifications to the content of files andreading the contents of any revision. We developa partially-persistent B+-tree for storing the his-tory of the metadata state of the file system,and a block store that uses cryptographic finger-prints and content-sensitive chunking to leveragefile similarity for compactness.The outline of this report is as follows: we be-gin by motivating the need for a persistent filesystem data structure in Section 2, and describethe high-level design of our application, the Per-siFS persistent network file system. In Section 3,we discuss the requirements and challenges forfile system data structures. We first present aninitial design in Section 4, based on simple datastructures that provides the necessary function-ality but limited efficiency. We then discuss inSection 5 how to refine the representation us-ing more advanced data-structural techniques,1including a new persistent variant of B+-trees.Finally, Section 6 discusses the implementationof these data structures and provides measure-ments comparing the two designs, and Section 7provides concluding remarks.2 BackgroundThe data structures described in this paper werecreated as a part of our implementation of a per-sistent (i.e. version-controlled) network file sys-tem called PersiFS [4]. In this section, we exam-ine the features, challenges, and high-level designof PersiFS in order to understand the context forour data structures, and how they interact withthe other components of PersiFS.2.1 MotivationUsers frequently wish to undo changes they havemade, whether intentionally or inadvertently, tothe file system — for example, inadvertent dele-tions, restoring files corrupted by applicationbugs, or simply reverting to an earlier revisionof a document. Regular file systems do not sup-port this operation natively, and periodic back-ups, “trash can” interfaces, and “undelete” toolsprovide only a partial, inadequate solution. Byallowing access to any previous state of the filesystem, PersiFS makes these tools unnecessary,since files are never truly deleted from the filesystem. Fears of accidental overwrite, rename ordeletion are unnecessary.Backup systems that archive snapshots of thestate of the filesystem are a common solution.Along the same lines, there exist snapshot-basedversion-controlled file systems [11, 7] that are thenatural extension of this idea, essentially makingautomated backups that are readily available (of-ten with some space optimizations). These sys-tems, however, cannot track changes that occurin the interval between snapshots. PersiFS im-proves on these systems by using continuous ver-sioning: each change to the file system is storedas a new revision, which solves this problem.However, this makes an efficient representationall the more critical.2.2 PersiFS DesignPersiFS is a networked file system that pro-vides access to previous revisions through anovel, convenient file system interface. Aclient system can mount a PersiFS volume viaa standard NFS interface, say as /persifs,and then the current version of the filesystem is available for read/write access at/persifs/now. Moreover, previous versions ofthe file system can be accessed simply by spec-ifying any time stamp instead. For example,/persifs/2005-05-12-12-00-00 is a read-onlysnapshot of the entire file system as it appearedat noon on May 12th, 2005. The interface is en-tirely through the file system: unlike some otherversion-controlled file systems, no special toolsare required to access previous revisions.The implementation of PersiFS is based on acentralized user-mode NFS server. A translationlayer converts NFS requests to operations on thefile system data structure, described below, thatis the core of the system.3 File System Data StructuresIn order to analyze and develop data structuresfor PersiFS, we will formalize the requirementsin the form of a file system data structure.We view our file system as a collection of files,each of which can be represented simply as astring of data. In reality, files are made up ofboth data and metadata such as last-modifiedtime, and as a practical matter these are oftenstored separately, but from an abstract data


View Full Document
Download Study Guide
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Guide and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Guide 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?