DOC PREVIEW
UMass Amherst CS 677 - Consistency Semantics

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS677: Distributed OSComputer ScienceLecture 16, page 1Last Class:Consistency Semantics• Consistency models– Data-centric consistency models– Client-centric consistency models• Eventual Consistency and epidemic protocolsCS677: Distributed OSComputer ScienceLecture 16, page 2Today: More on Consistency• Consistency protocols– Primary-based– Replicated-write• Putting it all together– Final thoughts• Fault-tolerance Introduction• Project 2CS677: Distributed OSComputer ScienceLecture 16, page 3Implementation Issues• Two techniques to implement consistency models– Primary-based protocols• Assume a primary replica for each data item• Primary responsible for coordinating all writes– Replicated write protocols• No primary is assumed for a data item• Writes can take place at any replicaCS677: Distributed OSComputer ScienceLecture 16, page 4Remote-Write Protocols• Traditionally used in client-server systemsCS677: Distributed OSComputer ScienceLecture 16, page 5Remote-Write Protocols (2)• Primary-backup protocol– Allow local reads, sent writes to primary– Block on write until all replicas are notified– Implements sequential consistencyCS677: Distributed OSComputer ScienceLecture 16, page 6Local-Write Protocols (1)• Primary-based local-write protocol in which a single copy is migrated betweenprocesses.– Limitation: need to track the primary for each data itemCS677: Distributed OSComputer ScienceLecture 16, page 7Local-Write Protocols (2)• Primary-backup protocol in which the primary migrates to theprocess wanting to perform an updateCS677: Distributed OSComputer ScienceLecture 16, page 8Replicated-write Protocols• Relax the assumption of one primary– No primary, any replica is allowed to update– Consistency is more complex to achieve• Quorum-based protocols– Use voting to request/acquire permissions from replicas– Consider a file replicated on N servers– Update: contact at least (N/2+1) servers and get them to agreeto do update (associate version number with file)– Read: contact majority of servers and obtain version number• If majority of servers agree on a version number, readCS677: Distributed OSComputer ScienceLecture 16, page 9Gifford’s Quorum-Based Protocol• Three examples of the voting algorithm:a) A correct choice of read and write setb) A choice that may lead to write-write conflictsc) A correct choice, known as ROWA (read one, write all)CS677: Distributed OSComputer ScienceLecture 16, page 10Final Thoughts• Replication and caching improve performance indistributed systems• Consistency of replicated data is crucial• Many consistency semantics (models) possible– Need to pick appropriate model depending on the application– Example: web caching: weak consistency is OK since humansare tolerant to stale information (can reload browser)– Implementation overheads and complexity grows if strongerguarantees are desiredCS677: Distributed OSComputer ScienceLecture 16, page 11Fault Tolerance• Single machine systems– Failures are all or nothing• OS crash, disk failures• Distributed systems: multiple independent nodes– Partial failures are also possible (some nodes fail)• Question: Can we automatically recover from partialfailures?– Important issue since probability of failure grows with numberof independent components (nodes) in the systems– Prob(failure) = Prob(Any one component fails)=1-P(no failure)CS677: Distributed OSComputer ScienceLecture 16, page 12A Perspective• Computing systems are not very reliable– OS crashes frequently (Windows), buggy software, unreliable hardware,software/hardware incompatibilities– Until recently: computer users were “tech savvy”• Could depend on users to reboot, troubleshoot problems– Growing popularity of Internet/World Wide Web• “Novice” users• Need to build more reliable/dependable systems– Example: what is your TV (or car) broke down every day?• Users don’t want to “restart” TV or fix it (by opening it up)• Need to make computing systems more reliableCS677: Distributed OSComputer ScienceLecture 16, page 13Basic Concepts• Need to build dependable systems• Requirements for dependable systems– Availability: system should be available for use at any giventime• 99.999 % availability (five 9s) => very small down times– Reliability: system should run continuously without failure– Safety: temporary failures should not result in a catastrophic• Example: computing systems controlling an airplane,nuclear reactor– Maintainability: a failed system should be easy to repairCS677: Distributed OSComputer ScienceLecture 16, page 14Basic Concepts (contd)• Fault tolerance: system should provide services despitefaults– Transient faults– Intermittent faults– Permanent faultsCS677: Distributed OSComputer ScienceLecture 16, page 15Failure Models• Different types of failures.A server may produce arbitrary responses at arbitrary timesArbitrary failureThe server's response is incorrectThe value of the response is wrongThe server deviates from the correct flow of controlResponse failure Value failure State transition failureA server's response lies outside the specified time intervalTiming failureA server fails to respond to incoming requestsA server fails to receive incoming messagesA server fails to send messagesOmission failure Receive omission Send omissionA server halts, but is working correctly until it haltsCrash failureDescriptionType of failureCS677: Distributed OSComputer ScienceLecture 16, page 16Failure Masking by Redundancy• Triple modular redundancy.CS677: Distributed OSComputer ScienceLecture 16, page 17Project 2• Online banking using a distributed database• Database distributed across 3 machines– Set of accounts split across the three disks• Replication: each account is also replicated on a secondmachineCS677: Distributed OSComputer ScienceLecture 16, page 18Project 2• Load balancer/request redirector– All client requests arrive at this component– Forwards the request to an “appropriate” server– Two policies: per-account round-robin, least-loaded• Distributed locks: Ricart and Agarwala algorithm– Can use logical clocks or simplify using physical clocks• Consistency: strict/release, before releasing a lock,propagate changes to


View Full Document

UMass Amherst CS 677 - Consistency Semantics

Download Consistency Semantics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Consistency Semantics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Consistency Semantics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?