CORNELL CS 614 - Study Notes (27 pages)

Previewing pages 1, 2, 3, 25, 26, 27 of 27 page document View the full content.
View Full Document

Study Notes



Previewing pages 1, 2, 3, 25, 26, 27 of actual document.

View the full content.
View Full Document
View Full Document

Study Notes

116 views

Lecture Notes


Pages:
27
School:
Cornell University
Course:
Cs 614 - Advanced Systems
Advanced Systems Documents

Unformatted text preview:

Using Time Instead of Timeout for Fault Tolerant Distributed Systems LESLIE LAMPORT SRI International A general method is described for implementing a distributed system with any desired degree of faulttolerance Instead of relying upon explicit timeouts processes execute a simple clock driven algorithm Reliable clock synchronization and a solution to the Byzantine Generals Problem are assumed Categories and Subject Descriptors C 2 4 Computer Communications Networks Distributed Systems network operating systems D 1 3 Programming Techniques Concurrent Programming D 4 1 Operating Systems Process Management synchronization D 4 3 Operating Systems File Systems Management distributed file systems D 4 5 Operating Systems Reliability ault toleranee D 4 7 Operating Systems Organization and Design distributed systems real time systems General Terms Design Reliability Additional Key Words and Phrases Clocks transaction commit timestamps in ractive consistency Byzantine Generals Problem 1 INTRODUCTION In programming asynchronous multiprocess systems the customary approach has been to make process synchronization independent of the execution rates of any components This requires synchronization algorithms in which one process must wait for another to do something before it can proceed In distributed systems this means waiting for a message from the other process These timeindependent algorithms cannot be fault tolerant because a process could fail by doing nothing and such a failure manifests itself only as a reduction of the process s execution rate 5 The usual method of obtaining fault tolerant synchronization in distributed systems is to add timeouts to time independent algorithms A process sets a timer whenever it begins waiting for another process a n d a failure is assumed to have occurred if a certain period of time elapses without a response from the other This work was supported in part by the National Science Foundation under Grant No MCS 7816783 and in part by the



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?