New version page

Zebra: A Striped Network File System

This preview shows page 1-2-3 out of 10 pages.

View Full Document
View Full Document

End of preview. Want to read all 10 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

Zebra: A Striped Network File SystemJohn H. HartmanJohn K. OusterhoutComputer Science DivisionElectrical Engineering and Computer SciencesUniversity of CaliforniaBerkeley, CA 94720AbstractThis paper presents the design of Zebra, a striped network file system.Zebra applies ideas from log-structured file system (LFS) and RAIDresearch to network file systems, resulting in a network file system that hasscalable performance, uses its servers efficiently even when its applicationsare using small files, and provides high availability. Zebra stripes file dataacross multiple servers, so that the file transfer rate is not limited by theperformance of a single server. High availability is achieved by maintain-ing parity information for the file system. If a server fails its contents canbe reconstructed using the contents of the remaining servers and the parityinformation. Zebra differs from existing striped file systems in the way itstripes file data: Zebra does not stripe on a per-file basis; instead it stripesthe stream of bytes written by each client. Clients write to the servers inunits called stripe fragments, which are analogous to segments in an LFS.Stripe fragments contain file blocks that were written recently, withoutregard to which file they belong. This method of striping has numerousadvantages over per-file striping, including increased server efficiency, effi-cient parity computation, and elimination of parity update.This paper will appear in the proceedings of the USENIX Workshop onFile Systems, May 1992.This work was supported in part by the National Science Foundation under grant CCR-8900029,the National Aeronautics and Space Administration and the Defense Advanced Research ProjectsAgency under contract NAG2-591.Zebra April 28, 199211 IntroductionZebra is a network file system architecture designed to provide both high performanceand high availability. This is accomplished by incorporating ideas from log-structured filesystems, such as Sprite LFS [Rosenblum91], and redundant arrays of inexpensive disks(RAID) [Patterson88] into a network file system. From log-structured file systems Zebraborrows the idea that small, independent writes to the storage subsystem can be batchedtogether into large sequential writes, thus improving the storage subsystem’s write perfor-mance. RAID research has focused on using striping and parity to obtain high perfor-mance and high availability from arrays of relatively low-performance disks. Zebra usesstriping and parity as well, resulting in a network file system that stripes data across multi-ple storage servers, uses parity to provide high availability, and transfers file data betweenthe clients and the storage servers in large units. The notable features of Zebra can be char-acterized as follows:Scalable performance. A file in Zebra may be striped across several storage servers,allowing its contents to be transferred in parallel. Thus the aggregate file transfer band-width can exceed the bandwidth capabilities of a single server.High server efficiency. Storage servers are most efficient handling large data transfersbecause small transfers have high overheads. Large transfers are relatively simple toachieve for large files, but small files pose a problem. Client file caches are effective atreducing server accesses for small file reads, but they aren’t as effective at filtering outsmall file writes [Baker91]. Zebra clients use the storage servers efficiently by writingto them in large transfers, even if their applications are writing small files.High availability1. Zebra can tolerate the loss of any single machine in the system,including a storage server. Zebra makes file data highly available by maintaining theparity of the file system contents. If a server crashes its contents can be reconstructedusing the parity information. The use of parity allows Zebra to provide the availabilityof a system that maintains redundant copies of its files while requiring only a fractionof the storage overhead.Uniform server loads. File striping causes the load incurred by a heavily used (hot) fileto be shared by all of the storage servers that store the file. In a traditional network filesystem a hot file only affects the performance of the server that stores it, requiring thathot files be carefully distributed among all of the servers to balance the load.Zebra is currently only a paper design, although a prototype is being implemented inthe Sprite operating system [Ousterhout88]. This paper describes the design of Zebra, notthe prototype implementation. The rest of this paper is organized as follows. Section 2 dis-cusses striping and its application to a network file system, Section 3 discusses the use ofparity to provide high availability, Section 4 gives an overview of the Zebra architecture,and Section 5 describes the Zebra design in more detail. Section 6 covers Zebra’s statusand future work, and Section 7 is a conclusion.1. The distinction between availability and reliability, while it is important, is not particularly rele-vant to this paper. The arguments made here regarding the availability of Zebra also apply to itsreliability.Zebra April 28, 199222 Why Stripe?Traditional network file systems confine each file to a single file server. Unfortunatelythis means that the rate at which a file can be transferred between the server and a client islimited by the performance characteristics of that one server, such as its CPU power, itsmemory bandwidth and the performance of its I/O controllers. This makes it difficult toimprove the performance of the network file system without improving or replacing theserver. Striping a file over several servers allows those servers to transfer the file in paral-lel, so that the aggregate transfer rate can be much higher than that of any one server. Thefile transfer performance of the file system can be improved simply by adding more serv-ers.A striped network file system has several economic advantages over a traditional net-work file system. First, the storage servers do not need to be high-performance, nor dothey need to be constructed out of special-purpose hardware. Servers in a traditional net-work file system are often among the more expensive and high-performance machines inthe system. In contrast, storage servers in a striped network file system can be relativelymodest machines, thereby improving their cost/performance and reducing the fraction ofthe total system cost that they represent. Second, a


Loading Unlocking...
Login

Join to view Zebra: A Striped Network File System and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Zebra: A Striped Network File System and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?