Abstract Replicated Distributed Programs Eric C. Cooper Computer Science Division--EECS University of California Berkeley, California 94720 A troupe is a set of replicas of a module, executing on machines that have independent failure modes. Troupes are the building blocks of replicated distributed programs and the key to achieving high availability. Individual members of a troupe do not communicate among themselves, and axe unaware of one another's existence; this property is what distinguishes troupes from other software architectures for fault tolerance. Replicated procedure call is introduced to handle the many-to-many pattern of conmmnication between troupes. The semantics of replicated procedure call can be summa- rized as exactly-once execution at all replicas. An implementation of troupes and replicated procedure call is described, and its performance is measured. The problem of concurrency control for troupes is examined, and a commit protocol for replicated atomic transactions is presented. Binding and reconfiguration mechanisms for replicated distributed programs are described. 1 Introduction This paper addresses the problem of constructing highly available distributed programs. (The adjectives highly available, fault-tolerant, and nonstop will be used synony- mously to describe a system that continues to operate de- spite failures of some of its components.) The goal is to construct programs that automatically tolerate crashes of the underlying hardware. The problems posed by incorrect software or by hardware failures other than crashes are only addressed briefly. The key to tolerating component failures is replication; Author's present address: Department of Computer Science, Carnegie.. Mellon University, Pittsburgh, Pennsylvania 15213. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. © 1985 ACM-0-89791- 174- 1- 12/85-0063 $00.75 this approach was proposed by yon Neumann thirty years ago [29]. The idea is to replicate each component to such a degree that the probability of all replicas failing becomes acceptably small. The advent of inexpensive distributed computing systems (consisting of computers connected to- gether by a network) makes replication an attractive and practical means of tolerating hardware crashes. The ability to vary replication on a per-module basis is desirable because it allows software systems to adapt grace- fully to changing characteristics of the underlying hard- ware. Even if perfectly reliable hardware were possible, there would still be periods during which hardware would be unavailable: scheduled down-time for preventive main- tenance or reconfiguration, for example. The mechanisms described in this paper permit distributed programs to be reconfigured, while they axe executing, so that their services remain available during such periods. Incorporating replication on a per-module basis is more flexible than previous approaches, such as providing fault tolerance in hardware or writing it into the application software. The first method is too expensive because it uses reliable hardware everywhere, not just for critical modules. The second approach burdens the programmer with the complexity of a non-transparent mechanism. The fundamental mechanisms presented in this paper are: • troupes, or replicated modules, and • replicated procedure call, a generalization of remote procedure call for many-to-many communication be- tween troupes. The following important property is what distinguishes troupes and replicated procedure call from previous soft- ware architectures for fault tolerance: individual members of a troupe do not communicate among themselves, and axe unaware of one another's existence. This property is also what gives these mechanisms their flexibility and power: since each troupe member behaves as if it had no replicas, the degree of replication of a. troupe can be varied dynam- ically, with no recompilation or relinking. 63Previous papers presented the author's initial ideas about replicated procedure calls [10] and a description of the Circus system [11]. This paper presents a portion of the author's Ph.D. dissertation [12]. 2 Background and Related Work The idea of achieving fault tolerance by using replica- tion to mask the failures of individual components dates back to yon Neumann [29]. The two architectures for fault- tolerant software are primary-standby systems and modular redundancy. In a primary-standby scheme, only a single component functions normally; the remaining replicas are on standby in case the primary fails. With modular redun- dancy, each component performs the same function; there is some form of voting on the outputs to mask failures. A classic primary-standby architecture is the method of process pairs in Tandem's Guardian operating system [1]. The processes in a process pair execute on different proces- sors. One process is designated as the primary, the other as the standby. Before each request is processed, the primary sends information about its internal state to the standby, in the form of a checkpoint. The checkpoint enables the standby to complete the request if the primary fails. The Auragen architecture combines a primary-standby scheme with automatic logging of messages [6]. If a primary crashes, the log is used to replay the appropriate messages to a standby. The Isis project at Cornell uses a primary-standby architecture for replicated objects [3]. In each interaction with a replicated object in Isis, one replica plays the role of coordinator, and only it performs the operation. The coordinator then uses a two-phase commit protocol to update the other replicas. The mechanisms used in primary-standby schemes to allow a standby to take over after the primary crashes are isomorphic to crash recovery mechanisms based on stable storage. Under this isomorphism, a standby corresponds to stable storage
View Full Document