Unformatted text preview:

Chapter 7 Replication Management using the State Machine Approach Fred B Schneider Department of Computer Science Cornell University Ithaca New York 14853 U S A This chapter reprints my paper Implementing Fault tolerant Services using the State Machine Approach A Tutorial which orginally appeared in ACM Computing Surveys 22 Dec 1990 The paper has been reformatted but otherwise remains unchanged Most distributed systems employ replicated services in one form or another By replicating a service we can support fault tolerance as well as improving overall throughput by placing server replicas at sites where the service is needed Protocols for replication management can be divided into two general classes The first called the state machine approach or active replication has no centralized control This class is the subject of this chapter The second class of protocols is called the primary backup approach and it is discussed in Chapter 8 FS et al primary backup The state machine approach ties together a number of the fundamental problems that have been discussed in previous chapters Chapter 2 FS models should be consulted to put in perspective the distributed systems models of this chapter Chapter 4 Ozalp Keith discusses the logical clocks used here to order client requests And Chapter 5 Toueg Hadzlacos discusses semantics for various communications primitives that support the state machine approach This material is based on work supported in part by the Office of Naval Research under contract N00014 91 J 1219 the National Science Foundation under Grant No CCR 8701103 and DARPA NSF Grant No CCR 9014363 Any opinions findings and conclusions or recommendations expressed in this publication are those of the author and do not reflect the views of these agencies 1 1 Introduction Distributed software is often structured in terms of clients and services Each service comprises one or more servers and exports operations which clients invoke by making requests Although using a single centralized server is the simplest way to implement a service the resulting service can only be as fault tolerant as the processor executing that server If this level of fault tolerance is unacceptable then multiple servers that fail independently must be employed Usually replicas of a single server are executed on separate processors of a distributed system and protocols are employed to coordinate client interactions with these replicas The physical and electrical isolation of processors in a distributed system ensures that server failures are independent as required The state machine approach is a general method for implementing a fault tolerant service by replicating servers and coordinating client interactions with server replicas 1 The approach also provides a framework for understanding and designing replication management protocols Many protocols that involve replication of data or software be it for masking failures or simply to facilitate cooperation without centralized control can be derived using the state machine approach Although few of these protocols actually were obtained in this manner viewing them in terms of state machines helps in understanding how and why they work This paper is a tutorial on the state machine approach It describes the approach and its implementation for two representative environments Small examples suffice to illustrate the points However the approach has been successfully applied to larger examples some of these are mentioned in 9 Section 2 describes how a system can be viewed in terms of a state machine clients and output devices Coping with failures is the subject of 3 through 6 An important class of optimizations based on the use of time is discussed in 7 Section 8 describes dynamic reconfiguration The history of the approach and related work is discussed in 9 2 State Machines Services servers and most programming language structures for supporting modularity define state machines A state machine consists of state variables which encode its state and commands which transform its state Each command is implemented by a deterministic program execution of the command is atomic with respect to other commands and modifies the state variables and or produces some output A client of the state machine makes a request to execute a command The request names a state machine names the command to be performed and contains any information needed by the command Output from request processing can be to an actuator e g in a processcontrol system to some other peripheral device e g a disk or terminal or to clients awaiting responses from prior requests 1 The term state machine is a poor one but nevertheless is the one used in the literature 1 In this tutorial we will describe a state machine simply by listing its state variables and commands As an example state machine memory of Figure 2 1 implements a time varying mapping from locations to values A read command permits a client to determine the value currently associated with a location and a write command associates a new value with a location For generality our descriptions of state machines deliberately do not specify how command invocation is implemented Commands might be implemented using a collection of procedures that share data and are invoked by a call as in a monitor using a single process that awaits messages containing requests and performs the actions they specify as in a server or using a collection of interrupt handlers in which case a request is made by causing an interrupt as in an operating system kernel Disabling interrupts permits each command to be executed to completion before the next is started For example the state machine of Figure 2 2 implements commands to ensure that at all times at most one client has been granted access to some resource In it x y denotes the result of appending y to the end of list x head x denotes the first element of list x and tail x denotes the list obtained by deleting the first element of list x This state machine would probably be implemented as part of the supervisor call handler of an operating system kernel Requests are processed by a state machine one at a time in an order that is consistent with potential causality Therefore clients of a state machine can make the following assumptions about memory state machine var store array 0 n of word read command loc 0 n send store loc to client end read write command loc 0 n value word store loc value end write end memory Figure 2 1 A


View Full Document

MIT 6 824 - Study Notes

Documents in this Course
Logging

Logging

4 pages

Load more
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?