Replication in the Harp File System Barbara Liskov Sanjay Ghemawat Robert Gruber Paul Johnson Liuba Shrira Michael Williams Laboratory for Computer Science Massachusetts Institute of Technology Cambridge MA 02139 Abstract This paper describes the design and implementation of the Harp file system Harp is a replicated Unix file system accessible via the VFS interface It provides highly available and reliable storage for files and guarantees that file operations are executed atomically in spite of concurrency and failures It uses a novel variation of the primary copy replication technique that provides good performance because it allows us to trade disk accesses for network communication Harp is intended to be used within a file service in a distributed network in our current implementation it is accessed via NFS Preliminary performance results indicate that Harp provides equal or better response time and system capacity than an unreplicated implementation of NFS that uses Unix files directly 1 Introduction This paper describes the replication technique used in the Harp file system Harp is a Highly Available Reliable Persistent file system Harp provides highly available and reliable storage for files With very high probability information in files will not be lost or corrupted and will be accessible when needed in spite of failures such as node and media crashes and network partitions All modifications to a file are reliably recorded at several server nodes the number of nodes depends on how many failures the file is intended to survive We take advantage of replication to provide a strong semantics for file operations each operation is performed atomically in spite of concurrency and failures This research was supported in part by the Advanced Research Projects Agency of the Department of Defense monitored by the Office of Naval Research under contract N00014 89 J 1988 and in part by the National Science Foundation under grant CCR 8822158 Sanjay Ghemawat and Robert Gruber were supported in part by National Science Foundation Graduate Fellowships The Digital Equipment Corporation provided support under an external research grant Harp uses the primary copy replication technique 1 26 27 In this method client calls are directed to a single primary server which communicates with other backup servers and waits for them to respond before replying to the client The system masks failures by performing a failover algorithm in which an inaccessible server is removed from service When a primary performs an operation it must inform enough backups to guarantee that the effects of that operation will survive all subsequent failovers Harp is one of the first implementations of a primary copy scheme that runs on conventional hardware It has some novel features that allow it to perform well The key performance issues are how to provide quick response for user operations and how to provide good system capacity roughly the number of operations the system can handle in some time period while still providing good response time Harp achieves good performance by recording the effects of modification operations in a log that resides in volatile memory operations in the log are applied to the file system in the background Essentially it removes disk accesses from the critical path replacing them with communication from the primary to the backups which is substantially faster if the servers are reasonably close together In using the log to record recent modifications Harp is relying on a write behind strategy but the strategy is safe because log entries are not lost in failures We equip each server with a small uninterruptible power supply UPS that allows it to run for a short while e g a few minutes after a power failure the server uses this time to copy information in the log to disk The combination of the volatile log and the UPS is one of the novel features of Harp Harp provides reliable storage for information Information survives individual node failures because it exists in volatile memory at several nodes It survives a power failure because of the UPS s Also Harp attempts to preserve information in the face of simultaneous software failures by techniques explained later in the paper Harp supports the virtual file system VFS 19 interface It guarantees that operations have really happened when they return i e their effects will not be lost in subsequent failovers In fact all operations in Harp are implemented atomically an operation either completes entirely or has no effect in spite of concurrency and failures Harp is intended to be used within a file service in a distributed network such as NFS 31 35 or AFS 17 The idea is that users continue to use the file service just as they always did However the server code of the file service calls Harp via the VFS interface and achieves higher reliability and availability as a result Harp makes calls to low level Unix file system operations Thus the Harp code is just a small layer in the overall system as illustrated in Figure 1 1 Client requests network Server NFS VFS Interface Unix file system interface Harp file system interface Low level Unix file system code Figure 1 1 Harp System Structure In the current implementation users use Harp via NFS We guarantee that the combination of the NFS code and Harp appears to the user to behave like an unreplicated NFS system as discussed in Section 4 5 this requires a little more work than just implementing the VFS calls correctly Harp can be used with any VFS based NFS server implementation and should be portable to most Unix systems We believe it can also be used by other network file systems that use VFS or similar systems such as the ULTRIX generic file system 29 but we have not yet investigated such a use This paper describes how replication works in Harp and provides some preliminary information on system performance The portion of Harp that handles processing of user operations has been implemented we are working now on the failover code The performance data indicate that Harp will perform well in the experiments Harp performs as well or better than an unreplicated implementation of NFS that uses Unix files directly both in terms of response time to users and in overall system capacity The results show that high availability can be achieved without degradation of performance by using a small amount of additional hardware extra disks to hold the extra file copies and UPS s The remainder of the paper is organized
View Full Document
Unlocking...