MIT 6 826 - Consensus - D530920

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 826> Consensus

MIT 6 826 - Consensus

School name Massachusetts Institute of Technology

Course 6 826- Principles of Computer Systems

Pages 9

Download Save

Unformatted text preview:

6.826—Principles of Computer Systems 1999 Handout 18. Consensus 1 18. Consensus Consensus (sometimes called ‘reliable broadcast’ or ‘atomic broadcast’) is a fundamental building block for distributed systems. Informally, we say that several processes achieve consensus if they all agree on some value. Three obvious applications are: Distributed transactions, where all the processes need to agree on whether a transaction commits or aborts. Each transaction needs a new consensus on its outcome. Membership, where a set of processes cooperating to provide a highly available service need to agree on which processes are currently functioning as members of the set. Every time a process fails or starts working again there must be a new consensus. Electing a leader of a group of processes. State machines There is a much more general way to use consensus, as the mechanism for implementing a highly available state machine. The way to get availability is to have either perfect components or redundancy. The simplest form of redundancy is replication: have several copies of each component, and make all the non-faulty components do the same thing. Since any computation can be expressed as a state machine, a replicated state machine can make any computation highly available. Recall the basic idea of a replicated state machine: If the transition relation is deterministic (in other words, is a function from (state, input) to (new state, output)), then several copies of the state machine that start in the same state and see the same sequence of inputs will do the same thing, that is, end up in the same state and produce the same outputs. So if several processes are implementing the same state machine and achieve consensus on the values and order of the inputs, they will do the same thing. In this way it’s possible to replicate an arbitrary computation and thus make it highly available. Of course we can make the order a part of the value of the input by defining some total order on the set of possible inputs.1 We have already seen one application of this replicated state machine idea, in the implementation of transactions; there the replication takes the form of redoing a sequence of actions that is remembered in a log. In many applications the inputs are requests from clients to the replicated service. Typically different clients generate their requests independently, so it’s necessary to agree not only on what the requests are, but also on the order in which to serve them. The simplest way to do this is to 1 This approach was first proposed in a classic paper by Leslie Lamport: Time, clocks, and the ordering of events in a distributed system, Comm. ACM 21, 7, July 1978, pp 558-565. This paper is better known for its analysis of the partial ordering of events in a distributed system, which is too bad. 6.826—Principles of Computer Systems 1999 Handout 18. Consensus 2 number them with consecutive integers, starting at 1. This is what is done in ‘primary copy’ replication, since it’s easy for one place (the primary) to assign consecutive numbers. The literature contains many other schemes for achieving consensus on the order of requests when their total order is not derived from consecutive integers. These schemes label each input with some label from a totally ordered set (for instance, (client UID, timestamp) pairs) and then devise some way to be certain that you have seen all the inputs that can ever exist with labels smaller than a given value. They are complicated, and of doubtful utility.2 The section on leases at the end of this handout explains practical methods for minimizing the number of times you need to use consensus in implementing a reliable state machine. Specification for consensus Here is the specification for consensus; we have seen it already in handout 8 on history and prophecy variables. There is an outcome variable initialized to nil, and an action Allow(v) that can be invoked any number of times. There is also an action Outcome to read the outcome variable; it must return either nil or a v which was the argument of some Allow action, and it must always return the same v. More precisely, we have two requirements: Agreement: Every non-nil result of Outcome is the same. Validity: A non-nil outcome equals some allowed value. Validity means that the outcome can’t be any arbitrary value, but must be a value that was allowed. Consensus is reached by choosing some allowed value and assigning it to outcome. This spec makes the choice on the fly as the allowed values arrive. MODULE Consensus [V] EXPORT Allow, Outcome = % data Value to agree on VAR outcome : (V + Null) := nil APROC Allow(v) = << outcome = nil => outcome := v [] SKIP >> APROC Outcome() -> (V+Null) = << RET outcome [] RET nil >> END Consensus Note that Outcome is allowed to return nil even after the choice has been made. This reflects the fact that in an implementation with several replicas, Outcome is often implemented by talking to just one of the replicas, and that replica may not yet have learned about the choice. If only one Allow action occurs, there’s no need to choose a v, and the implementation’s only problem is to ensure termination. An algorithm that does so is said to implement ‘reliable’ or ‘atomic’ broadcast; there is only one sender, and either everyone or no one gets the message. The 2 For details, see F. Schneider, Implementing fault-tolerant services using the state-machine approach: A tutorial, ACM Computing Surveys 22 (Dec 1990). This paper is reprinted in the book Distributed Systems, 2nd edition, ed. S. Mullender, Addison-Wesley, 1993, pp 169-197.6.826—Principles of Computer Systems 1999 Handout 18. Consensus 3 single Allow might not set outcome, which corresponds to failure of the sender of the broadcast message; in this case no one gets the message. Here is a slightly more complicated, but perhaps more intuitive, spec. It accumulates the allowed values and then chooses one of them in the internal action Agree. MODULE LateConsensus [V] EXPORT Allow, Outcome = VAR outcome : (V + Null) := nil allowed : SET V := {} APROC Allow(v) = << allowed := allowed + {v} >> APROC Outcome() -> (V+Null) = << RET outcome [] RET nil >> % Only outcome is visible APROC Agree() = << VAR v :IN allowed | outcome = nil => outcome := v >> END LateConsensus It should be fairly clear that LateConsensus implements Consensus. An abstraction function to prove

View Full Document


School:
Email:
New Password:
Confirm Password:

MIT 6 826 - Consensus

Sign up for free to view:

Please select your school