Lightweight Recoverable Virtual Memory M Satyanarayanan Henry H Mashburn Puneet Kumar David C Steere James J Kistler School of Computer Science Carnegie Mellon University Abstract Recoverable virtual memory refers to regions of a virtual address space on which transactional guarantees are offered This paper describes RVM an efficient portable and easily used implementation of recoverable virtual memory for Unix environments A unique characteristic of RVM is that it allows independent control over the transactional properties of atomicity permanence and serializability This leads to considerable flexibility in the use of RVM potentially enlarging the range of applications than can benefit from transactions It also simplifies the layering of functionality such as nesting and distribution The paper shows that RVM performs well over its intended range of usage even though it does not benefit from specialized operating system support It also demonstrates the importance of intra and intertransaction optimizations 1 Introduction How simple can a transactional facility be while remaining a potent tool for fault tolerance Our answer as elaborated in this paper is a user level library with minimal programming constraints implemented in about 10K lines of mainline code and no more intrusive than a typical runtime library for input output This transactional facility called RVM is implemented without specialized operating system support and has been in use for over two years on a wide range of hardware from laptops to servers RVM is intended for Unix applications with persistent data structures that must be updated in a fault tolerant manner The total size of those data structures should be a small fraction of disk capacity and their working set size must easily fit within main memory This work was sponsored by the Avionics Laboratory Wright Research and Development Center Aeronautical Systems Division AFSC U S Air Force Wright Patterson AFB Ohio 45433 6543 under Contract F33615 90 C 1465 ARPA Order No 7597 James Kistler is now affiliated with the DEC Systems Research Center Palo Alto CA This paper appeared in ACM Transactions on Computer Systems 12 1 Feb 1994 and Proceedings of the 14th ACM Symposium on Operating Systems Principles Dec 1993 This combination of circumstances is most likely to be found in situations involving the meta data of storage repositories Thus RVM can benefit a wide range of applications from distributed file systems and databases to object oriented repositories CAD tools and CASE tools RVM can also provide runtime support for persistent programming languages Since RVM allows independent control over the basic transactional properties of atomicity permanence and serializability applications have considerable flexibility in how they use transactions It may often be tempting and sometimes unavoidable to use a mechanism that is richer in functionality or better integrated with the operating system But our experience has been that such sophistication comes at the cost of portability ease of use and more onerous programming constraints Thus RVM represents a balance between the system level concerns of functionality and performance and the software engineering concerns of usability and maintenance Alternatively one can view RVM as an exercise in minimalism Our design challenge lay not in conjuring up features to add but in determining what could be omitted without crippling RVM We begin this paper by describing our experience with Camelot 10 a predecessor of RVM This experience and our understanding of the fault tolerance requirements of Coda 16 30 and Venari 24 37 were the dominant influences on our design The description of RVM follows in three parts rationale architecture and implementation Wherever appropriate we point out ways in which usage experience influenced our design We conclude with an evaluation of RVM a discussion of its use as a building block and a summary of related work 2 Lessons from Camelot 2 1 Overview Camelot is a transactional facility built to validate the thesis that general purpose transactional support would simplify and encourage the construction of reliable distributed systems 33 It supports local and distributed nested transactions and provides considerable flexibility in the choice of logging synchronization and transaction commitment strategies Camelot relies heavily on the external page management and interprocess communication facilities of the Mach operating system 2 which is binary compatible with the 4 3BSD Unix operating system 20 Figure 1 shows the overall structure of a Camelot node Each module is implemented as a Mach task and communication between modules is via Mach s interprocess communication facililty IPC NCA Camelot would be something of an overkill Yet we persisted because it would give us first hand experience in the use of transactions and because it would contribute towards the validation of the Camelot thesis We placed data structures pertaining to Coda meta data in recoverable memory1 on servers The meta data included Coda directories as well as persistent data for replica control and internal housekeeping The contents of each Coda file was kept in a Unix file on a server s local file system Server recovery consisted of Camelot restoring recoverable memory to the last committed state followed by a Coda salvager which ensured mutual consistency between meta data and data Application Library Library Data Server Data Server Library Library Recoverable Processes Node Server 2 3 Experience Library Recovery Manager Log Disk Manager Log The most valuable lesson we learned by using Camelot was that recoverable virtual memory was indeed a convenient and practically useful programming abstraction for systems like Coda Crash recovery was simplified because data structures were restored in situ by Camelot Directory operations were merely manipulations of in memory data structures The Coda salvager was simple because the range of error states it had to handle was small Overall the encapsulation of messy crash recovery details into Camelot considerably simplified Coda server code Transaction Manager Com Master Control Camelot System Components Camelot Mach Kernel This figure shows the internal structure of Camelot as well as its relationship to application code Camelot is composed of several Mach tasks Master Control Camelot and Node Server as well as the Recovery Transaction and Disk Managers Camelot provides recoverable virtual
View Full Document
Unlocking...