Unformatted text preview:

6.826—Principles of Computer Systems 1999 Handout 18. Consensus 1 18. Consensus Consensus (sometimes called ‘reliable broadcast’ or ‘atomic broadcast’) is a fundamental building block for distributed systems. Informally, we say that several processes achieve consensus if they all agree on some value. Three obvious applications are: Distributed transactions, where all the processes need to agree on whether a transaction commits or aborts. Each transaction needs a new consensus on its outcome. Membership, where a set of processes cooperating to provide a highly available service need to agree on which processes are currently functioning as members of the set. Every time a process fails or starts working again there must be a new consensus. Electing a leader of a group of processes. State machines There is a much more general way to use consensus, as the mechanism for implementing a highly available state machine. The way to get availability is to have either perfect components or redundancy. The simplest form of redundancy is replication: have several copies of each component, and make all the non-faulty components do the same thing. Since any computation can be expressed as a state machine, a replicated state machine can make any computation highly available. Recall the basic idea of a replicated state machine: If the transition relation is deterministic (in other words, is a function from (state, input) to (new state, output)), then several copies of the state machine that start in the same state and see the same sequence of inputs will do the same thing, that is, end up in the same state and produce the same outputs. So if several processes are implementing the same state machine and achieve consensus on the values and order of the inputs, they will do the same thing. In this way it’s possible to replicate an arbitrary computation and thus make it highly available. Of course we can make the order a part of the value of the input by defining some total order on the set of possible inputs.1 We have already seen one application of this replicated state machine idea, in the implementation of transactions; there the replication takes the form of redoing a sequence of actions that is remembered in a log. In many applications the inputs are requests from clients to the replicated service. Typically different clients generate their requests independently, so it’s necessary to agree not only on what the requests are, but also on the order in which to serve them. The simplest way to do this is to 1 This approach was first proposed in a classic paper by Leslie Lamport: Time, clocks, and the ordering of events in a distributed system, Comm. ACM 21, 7, July 1978, pp 558-565. This paper is better known for its analysis of the partial ordering of events in a distributed system, which is too bad. 6.826—Principles of Computer Systems 1999 Handout 18. Consensus 2 number them with consecutive integers, starting at 1. This is what is done in ‘primary copy’ replication, since it’s easy for one place (the primary) to assign consecutive numbers. The literature contains many other schemes for achieving consensus on the order of requests when their total order is not derived from consecutive integers. These schemes label each input with some label from a totally ordered set (for instance, (client UID, timestamp) pairs) and then devise some way to be certain that you have seen all the inputs that can ever exist with labels smaller than a given value. They are complicated, and of doubtful utility.2 The section on leases at the end of this handout explains practical methods for minimizing the number of times you need to use consensus in implementing a reliable state machine. Specification for consensus Here is the specification for consensus; we have seen it already in handout 8 on history and prophecy variables. There is an outcome variable initialized to nil, and an action Allow(v) that can be invoked any number of times. There is also an action Outcome to read the outcome variable; it must return either nil or a v which was the argument of some Allow action, and it must always return the same v. More precisely, we have two requirements: Agreement: Every non-nil result of Outcome is the same. Validity: A non-nil outcome equals some allowed value. Validity means that the outcome can’t be any arbitrary value, but must be a value that was allowed. Consensus is reached by choosing some allowed value and assigning it to outcome. This spec makes the choice on the fly as the allowed values arrive. MODULE Consensus [V] EXPORT Allow, Outcome = % data Value to agree on VAR outcome : (V + Null) := nil APROC Allow(v) = << outcome = nil => outcome := v [] SKIP >> APROC Outcome() -> (V+Null) = << RET outcome [] RET nil >> END Consensus Note that Outcome is allowed to return nil even after the choice has been made. This reflects the fact that in an implementation with several replicas, Outcome is often implemented by talking to just one of the replicas, and that replica may not yet have learned about the choice. If only one Allow action occurs, there’s no need to choose a v, and the implementation’s only problem is to ensure termination. An algorithm that does so is said to implement ‘reliable’ or ‘atomic’ broadcast; there is only one sender, and either everyone or no one gets the message. The 2 For details, see F. Schneider, Implementing fault-tolerant services using the state-machine approach: A tutorial, ACM Computing Surveys 22 (Dec 1990). This paper is reprinted in the book Distributed Systems, 2nd edition, ed. S. Mullender, Addison-Wesley, 1993, pp 169-197.6.826—Principles of Computer Systems 1999 Handout 18. Consensus 3 single Allow might not set outcome, which corresponds to failure of the sender of the broadcast message; in this case no one gets the message. Here is a slightly more complicated, but perhaps more intuitive, spec. It accumulates the allowed values and then chooses one of them in the internal action Agree. MODULE LateConsensus [V] EXPORT Allow, Outcome = VAR outcome : (V + Null) := nil allowed : SET V := {} APROC Allow(v) = << allowed := allowed + {v} >> APROC Outcome() -> (V+Null) = << RET outcome [] RET nil >> % Only outcome is visible APROC Agree() = << VAR v :IN allowed | outcome = nil => outcome := v >> END LateConsensus It should be fairly clear that LateConsensus implements Consensus. An abstraction function to prove


View Full Document

MIT 6 826 - Consensus

Documents in this Course
Consensus

Consensus

10 pages

Load more
Download Consensus
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Consensus and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Consensus 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?