Unformatted text preview:

On the Minimal Synchronism Needed for Distributed Consensus DANNY DOLEV Hebrew University, Jerusalem, Israel CYNTHIA DWORK Cornell Univer.sity, Ithaca, New York AND LARRY STOCKMEYER IBM Ahnaden Research Center, San Jose, California Abstract. Reaching agreement is a primitive of distributed computing. Whereas this poses no problem in an ideal, failure-free environment, it imposes certain constraints on the capabilities of an actual system: A system is viable only if it permits the existence of consensus protocols tolerant to some number of failures. Fischer et al. have shown that in a completely asynchronous model, even one failure cannot be tolerated. In this paper their work is extended: Several critical system parameters, including various synchrony conditions, are identified and how varying these affects the number of faults that can be tolerated is examined. The proofs expose general heuristic principles that explain why consensus is possible in certain models but not possible in others. Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems- distributed applications; distributed databases; network operating systems; C.4 [Performance of Systems]: reliability, availability, and serviceability; F. I .2 [Computation by Abstract Devices]: Modes of Computation-parallelism; H.2.4 [Database Management]: Systems-distributed systems General Terms: Algorithms, Reliability, Theory, Verification Additional Key Words and Phrases: Agreement problem, asynchronous system, Byzantine Generals problem, commit problem, consensus problem, distributed computing, fault tolerance, reliability 1. Introduction The problem of reaching agreement among separated processors is a fundamental problem of both practical and theoretical importance in the area of distributed systems; (see, e.g., [I], [7], [8], [ 1 l] and [ 121). We consider a system of N processors A preliminary version of this paper appears in the Proceedings of the 24th Annual Symposium on Foundations of Computer Science, November 7-9, 1983, Tucson, Ariz., pp. 393-402. 0 1983 IEEE. Any portions of this paper that appeared in the original version are reprinted with permission. The work of D. Dolev was performed in part at Stanford University, supported in part by DARPA under grant MDA903-80-C-0107 and in part at the IBM Research Laboratory, San Jose, Calif. The work of C. Dwork was supported in part by National Science Foundation grant MCS 8 I-O 1220. Authors’ present addresses: D. Dolev, Computer Science Department, Hebrew University, Jerusalem, Israel; C. Dwork, Department K53/802, IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120; L. Stockmeyer, Department K53/802, IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95 120. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1987 ACM 0004-5411/87/0100-0077 $00.75 Journal ofthe Association for Computing Machinery, Vol. 34, No. I, January 1987, pp. 77-97.78 D. DOLEV, ET AL. PI,... , PN (N 2 2) that communicate by sending messages to one another. Initially, each pi has a binary value x;. At some point during its computation, a processor can irreversibly decide on a binary value V. Each processor follows a deterministic protocol involving the receipt and sending of messages. Even though the individual processor protocols are deterministic, there are three potential sources of nonde- terminism in the system. Processors might run at varying speeds, it might take varying amounts of time for messages to be delivered, and messages might be received in an order different from the order in which they were sent. A protocol solves the (nontrivial) consensus problem if (i) no matter how the system runs, every nonfaulty processor makes a decision after a finite number of steps; (ii) no matter how the system runs, two different nonfaulty processors never decide on different values; (iii) 0 and 1 are both possible decision values for (possibly different) assignments of initial values. (This condition is needed to avoid the trivial solution where each processor decides 1 regardless of its initial value.) If the processors and the communication system are completely reliable, the existence of consensus protocols is trivial. The problem becomes interesting when the protocol must operate correctly when some processors can be faulty. The failure mode studied in this paper is fail-stop, in which a failed processor neither sends nor receives messages. A consensus protocol is t-resilient if it operates correctly when at most t processors fail. The existence of N-resilient consensus protocols is easily established if the processors and the communication system are both syn- chronous. Intuitively, synchronous processors means that the internal clocks of the processors are synchronized to within some fixed rate of drift. Synchronous communication means that there is a fixed upper bound on the time for a message to be delivered. These two types of synchrony are assumed in much of the research on “Byzantine Agreement,” (e.g., [7] and [ 111). Our point of departure and motivation for this paper was the interesting recent result of Fischer et al. [lo], which states that in a completely asynchronous system no consensus protocol is l-resilient, that is, even one failure cannot be tolerated. In reading the proof of this result, one sees that three different types of asynchrony are used: (i) Processor asynchrony allows processors to “go to sleep” for arbitrarily long finite amounts of time while other processors continue to run. (ii) Communication asynchrony precludes an a priori bound on message delivery time. (iii) Message order asynchrony allows messages to be


View Full Document
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?