DOC PREVIEW
CORNELL CS 514 - Lecture Notes

This preview shows page 1-2-3-22-23-24-45-46-47 out of 47 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS514: Intermediate Course in Operating SystemsConclusion?Other optionsServer replicationPrimary/backupSlide 6Slide 7Issues?Split brain: reminderSlide 10Slide 11Implication?Real systemsHow does hardware help?ReconciliationSummaryReplication and High AvailabilitySteps to a solutionNon-blocking CommitDefinition of problemNon-trivialityTypical protocolCommit protocol illustratedSlide 24Slide 25Failure issuesFailure model impacts costs!Commit with simpler failure modelSlide 29Example of a hard scenarioSlide 31Slide 32Skeen: Three-phase commitSlide 34Three phase commit protocol illustratedObservations about 3PCAssumptions about failuresProblems with 3PCSituation in practical systems?Process groupsFailure detectionArchitectureSlide 43Slide 44IssuesGMP designReading ahead?CS514: Intermediate Course in Operating SystemsProfessor Ken BirmanBen Atkin: TALecture 9: Sept. 21Conclusion?•We set out to replicate data for increased availability•And concluded that–Quorum scheme works for updates–But commit is required–And represents a vulnerability•Other options?Other options•We mentioned primary-backup schemes•These are a second way to solve the problem•Based on the log at the data managerServer replication•Suppose the primary sends the log to the backup server•It replays the log and applies committed transactions to its replicated state•If primary crashes, the backup soon catches up and can take overPrimary/backupprimary backupClients initially connected to primary, which keeps backup up to date. Backup tracks loglogPrimary/backupprimary backupPrimary crashes. Backup sees the channel break, applies committed updates. But it may have missedthe last few updates!Primary/backupprimary backupClients detect the failure and reconnect to backup. Butsome clients may have “gone away”. Backup state couldbe slightly stale. New transactions might suffer from thisIssues?•Under what conditions should backup take over–Revisits the consistency problem seen earlier with clients and servers–Could end up with a “split brain”•Also notice that still needs 2PC to ensure that primary and backup stay in same states!Split brain: reminderprimary backupClients initially connected to primary, which keeps backup up to date. Backup follows loglogSplit brain: reminderTransient problem causes some links to break but not all.Backup thinks it is now primary, primary thinks backup is downprimarybackupSplit brain: reminderSome clients still connected to primary, but one has switchedto backup and one is completely disconnected from bothprimarybackupImplication?•A strict interpretation of ACID leads to conclusions that–There are no ACID replication schemes that provide high availability•Most real systems solve by weakening ACIDReal systems•They use primary-backup with logging•But they simply omit the 2PC–Server might take over in the wrong state (may lag state of primary)–Can use hardware to reduce or eliminate split brain problemHow does hardware help?•Idea is that primary and backup share a disk•Hardware is configured so only one can write the disk •If server takes over it grabs the “token”•Token loss causes primary to shut down (if it hasn’t actually crashed)Reconciliation•This is the problem of fixing the transactions impacted by lack of 2PC•Usually just a handful of transactions–They committed but backup doesn’t know because never saw commit record–Later. server recovers and we discover the problem•Need to apply the missing ones•Also causes cascaded rollback•Worst case may require human interventionSummary•Reliability can be understood in terms of –Availability: system keeps running during a crash–Recoverability: system can recover automatically•Transactions are best for latter•Some systems need both sorts of mechanisms, but there are “deep” tradeoffs involvedReplication and High Availability•All is not lost!•Suppose we move away from the transactional model•Can we replicate data at lower cost and with high availability?–Leads to “virtual synchrony” model–Treats data as the “state” of a group of participating processes–Replicated update: done with multicastSteps to a solution•First look more closely at 2PC, 3PC, failure detection–2PC and 3PC both “block” in real settings–But we can replace failure detection by consensus on membership–Then these protocols become non-blocking (although solving a slightly different problem)•Generalized approach leads to ordered atomic multicast in dynamic process groupsNon-blocking Commit•Goal: a protocol that allows all operational processes to terminate the protocol even if some subset crash•Needed if we are to build high availability transactional systems (or systems that use quorum replication)Definition of problem•Given a set of processes, one of which wants to initiate an action•Participants may vote for or against the action•Originator will perform the action only if all vote in favor; if any votes against (or don’t vote), we will “abort” the protocol and not take the action•Goal is all-or-nothing outcomeNon-triviality•Want to avoid solutions that do nothing (trivial case of “all or none”)•Would like to say that if all vote for commit, protocol will commit... but in distributed systems we can’t be sure votes will reach the coordinator!–any “live” protocol risks making a mistake and counting a live process that voted to commit as a failed process, leading to an abort•Hence, non-triviality condition is hard to captureTypical protocol•Coordinator asks all processes if they can take the action•Processes decide if they can and send back “ok” or “abort”•Coordinator collects all the answers (or times out)•Coordinator computes outcome and sends it backCommit protocol illustratedok to commit?Commit protocol illustratedok to commit?ok with usCommit protocol illustratedok to commit?ok with uscommitNote: garbage collection protocol not shown hereFailure issues•So far, have implicitly assumed that processes fail by halting (and hence not voting)•In real systems a process could fail in arbitrary ways, even maliciously•This has lead to work on the “Byzantine generals” problem, which is a variation on commit set in a “synchronous” model with malicious failuresFailure model impacts costs!•Byzantine model is very costly: 3t+1 processes needed to overcome t failures, protocol runs in t+1 rounds•This


View Full Document

CORNELL CS 514 - Lecture Notes

Documents in this Course
LECTURE

LECTURE

29 pages

LECTURE

LECTURE

28 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?