DOC PREVIEW
UMass Amherst CS 677 - Weak Consistency

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS677: Distributed OSComputer ScienceLecture 17, page 1Last Class: Weak Consistency• Eventual Consistency and epidemic protocols• Implementing consistency techniques– Primary-based– Replicated writes-based• Quorum protocolsCS677: Distributed OSComputer ScienceLecture 17, page 2Today: Fault Tolerance• Basic concepts in fault tolerance• Masking failure by redundancy• Process resilience2CS677: Distributed OSComputer ScienceLecture 17, page 3Motivation• Single machine systems– Failures are all or nothing• OS crash, disk failures• Distributed systems: multiple independent nodes– Partial failures are also possible (some nodes fail)• Question: Can we automatically recover from partialfailures?– Important issue since probability of failure grows with numberof independent components (nodes) in the systems– Prob(failure) = Prob(Any one component fails)=1-P(no failure)CS677: Distributed OSComputer ScienceLecture 17, page 4A Perspective• Computing systems are not very reliable– OS crashes frequently (Windows), buggy software, unreliablehardware, software/hardware incompatibilities– Until recently: computer users were “tech savvy”• Could depend on users to reboot, troubleshoot problems– Growing popularity of Internet/World Wide Web• “Novice” users• Need to build more reliable/dependable systems– Example: what is your TV (or car) broke down every day?• Users don’t want to “restart” TV or fix it (by opening it up)• Need to make computing systems more reliable3CS677: Distributed OSComputer ScienceLecture 17, page 5Basic Concepts• Need to build dependable systems• Requirements for dependable systems– Availability: system should be available for use at any giventime• 99.999 % availability (five 9s) => very small down times– Reliability: system should run continuously without failure– Safety: temporary failures should not result in a catastrophic• Example: computing systems controlling an airplane,nuclear reactor– Maintainability: a failed system should be easy to repairCS677: Distributed OSComputer ScienceLecture 17, page 6Basic Concepts (contd)• Fault tolerance: system should provide services despitefaults– Transient faults– Intermittent faults– Permanent faults4CS677: Distributed OSComputer ScienceLecture 17, page 7Failure Models• Different types of failures.A server may produce arbitrary responses at arbitrary timesArbitrary failureThe server's response is incorrectThe value of the response is wrongThe server deviates from the correct flow of controlResponse failure Value failure State transition failureA server's response lies outside the specified time intervalTiming failureA server fails to respond to incoming requestsA server fails to receive incoming messagesA server fails to send messagesOmission failure Receive omission Send omissionA server halts, but is working correctly until it haltsCrash failureDescriptionType of failureCS677: Distributed OSComputer ScienceLecture 17, page 8Failure Masking by Redundancy• Triple modular redundancy.5CS677: Distributed OSComputer ScienceLecture 17, page 9Process Resilience• Handling faulty processes: organize several processesinto a group– All processes perform same computation– All messages are sent to all members of the group– Majority need to agree on results of a computation– Ideally want multiple, independent implementations of theapplication (to prevent identical bugs)• Use process groups to organize such processesCS677: Distributed OSComputer ScienceLecture 17, page 10Flat Groups versus HierarchicalGroupsAdvantages and disadvantages?6CS677: Distributed OSComputer ScienceLecture 17, page 11Agreement in Faulty Systems• How should processes agree on results of a computation?• K-fault tolerant: system can survive k faults and yetfunction• Assume processes fail silently– Need (k+1) redundancy to tolerant k faults• Byzantine failures: processes run even if sick– Produce erroneous, random or malicious replies• Byzantine failures are most difficult to deal with– Need ? Redundancy to handle Byzantine faultsCS677: Distributed OSComputer ScienceLecture 17, page 12Byzantine Faults• Simplified scenario: two perfect processes with unreliable channel– Need to reach agreement on a 1 bit message• Two army problem: Two armies waiting to attack– Each army coordinates with a messenger– Messenger can be captured by the hostile army– Can generals reach agreement?– Property: Two perfect process can never reach agreement in presence ofunreliable channel• Byzantine generals problem: Can N generals reach agreementwith a perfect channel?– M generals out of N may be traitors7CS677: Distributed OSComputer ScienceLecture 17, page 13Byzantine Generals Problem• Recursive algorithm by Lamport• The Byzantine generals problem for 3 loyal generals and 1 traitor.a) The generals announce their troop strengths (in units of 1 kilosoldiers).b) The vectors that each general assembles based on (a)c) The vectors that each general receives in step 3.CS677: Distributed OSComputer ScienceLecture 17, page 14Byzantine Generals Problem Example• The same as in previous slide, except now with 2 loyal generals and one traitor.• Property: With m faulty processes, agreement is possible only if 2m+1 processesfunction correctly [Lamport 82]– Need more than two-thirds processes to function


View Full Document

UMass Amherst CS 677 - Weak Consistency

Download Weak Consistency
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Weak Consistency and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Weak Consistency 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?