Distributed Operating Systems Part III Andy Wang COP 5611 Advanced Operating Systems More Important DOS Tools and Mechanisms Transactions and two phase commit Hierarchical name spaces Optimistic methods Transactions A special case of atomic actions Originally from databases Sequence of operations that transforms a current consistent state to a new consistent state Without ever exposing an inconsistent state Transaction Example Move 10 from my savings account to my checking account Basically subtract 10 from savings account add 10 to checking account But never lose my 10 And never give me an extra 10 Running Transactions Multiple transactions must not interfere So you can run them one at a time Or run them simultaneously But avoiding all interference Serializability avoids interference Serializability A property of a proposed schedule of transactions A serializable schedule produces the same results as some serial execution of those transactions Even though the actions may be have been performed in a different order Consistency Example for a Distributed System Site B Site A Site C update variable X update variable Z update variable Y What Problems Could Arise Other processes could write the variables Other processes could read the variables Failures could interrupt the process How can we avoid these problems Requirements for Atomicity An action is atomic if Controlling process is unaware of other process activities and vise versa No communications with other processes No outside detectable state changes Indivisible and instantaneous to the rest of the system Commit and Abort A commit is an unconditional guarantee that a transaction will be completed An abort is an unconditional guarantee that a transaction will be completely backed out In both cases regardless of multiple failures All transactions eventually either commit or abort Aborting Transactions Requires returning system to its pretransaction state Essentially a rollback operation Typically done either by logging changes Or delaying updates until commit point is reached Reasons for Aborting Deadlocks Timeouts Protection violations Consistency violations Site or network failures Abort Mechanisms Write ahead logs Shadow pages These techniques generally useful for recovery from failures Write Ahead Logs An operation based abort mechanism For each operation performed record enough information to completely undo it Record operations and undo them if necessary Either old state or other information Undo log Write Ahead Log Protocol Write the undo log to stable storage Make the update If transaction commits undo log can be deleted garbage collected If transaction aborts use undo log to roll back operation And then delete garbage collect the log Pros Cons of Write Ahead Logs Only changed data need to be logged More complex garbage collection Require save on every data item written Shadow Pages State based approach Save a checkpoint of the state before performing transaction operations If transaction aborts restore old state Can be expensive if state is large Shadow paging limits the cost Shadow Page Algorithm Before writing a data item in a transaction 1 Make a complete copy of its page 2 Allow transaction to use the copy transparently 3 On commit switch shadow page for new page 4 On abort restore shadow page Pros Cons of Shadow Paging Simple garbage collection Can only save at page granularity Requires save on every page written Pages from multiple machines Distributed Commitment Mechanisms Mechanisms to guarantee atomicity of actions in a distributed system Important mechanisms for building distributed transactions Works with mechanisms to permit backing out of transactions The Byzantine Generals Problem An abstraction of the commit problem A group of independent generals must cooperate to attack a fortress If all attack at once they will succeed But if any do not attack they will fail Communications only by messengers Who can be delayed or captured The Problem Continued If all the generals wish to cooperate how can they reach agreement on when to attack under these conditions Cannot reach agreement in a fixed number of messages But if messengers eventually get through can be done in an indeterminate number of messages Generals and Transactions Transactions are like the attack The generals are like the sites participating in the attack The attack time is like the commit point How do the sites in a transaction agree it s OK to commit Two Phase Commit Algorithm to allow sites in a transaction to agree to commit Or equivalently agree to abort One process site is the coordinator The others are the cohorts Two Phase Commit Algorithm Phase 1 Coordinator sends COMMIT REQUEST to cohorts Coordinator waits for all replies Each cohort responds with a AGREED or ABORT message after writing local transaction information to stable storage Two Phase Commit Algorithm Continued Phase 2 If all cohorts agree coordinator logs COMMIT record and sends COMMIT to all cohorts else ABORT Coordinator waits for all acks Cohorts act and acknowledge Coordinator writes COMPLETE to its log Two Phase Commit Diagram Phase 1 Site B Site A Foo2 Foo1 Site C update Foo Foo3 Two Phase Commit Diagram Phase 1 Site B Site A Foo1 COMMIT REQUEST COMMIT REQUEST Site C update Foo Foo3 Foo2 Two Phase Commit Diagram Phase 1 Site B Site A Foo1 COMMIT REQUEST AGREED COMMIT REQUEST Site C AGREED update Foo Foo3 local log info Foo2 local log info Two Phase Commit Diagram Phase 2 Site B Site A Foo2 Foo1 local log info COMMIT Site C update Foo Foo3 local log info Two Phase Commit Diagram Phase 2 Site B Site A Foo1 COMMIT local log info COMMIT COMMIT Site C update Foo Foo2 Foo3 local log info Two Phase Commit Diagram Phase 2 Site B Site A Foo1 COMMIT COMMIT COMMIT COMMIT Site C update Foo Foo2 Foo3 COMMIT Two Phase Commit Diagram Phase 2 Site B Site A COMMIT Foo1 COMMIT ACK COMMIT Site C ACK update Foo Foo3 COMMIT Foo2 COMMIT Two Phase Commit Diagram Phase 2 Site B Site A COMMIT Foo1 COMPLETE ACK COMMIT Site C ACK update Foo Foo3 COMMIT Foo2 COMMIT Characteristics of Two Phase Commit Nobody commits until everyone agrees timeouts handle lost delayed messages Examination of logs of participants in cases of failures allow recovery Members block while waiting for completion Two Phase Commit Failure Recovery Coordinator fails before writing COMMIT record to log On recovery broadcast ABORT Coordinator fails between COMMIT and COMPLETE On recovery broadcast COMMIT Two Phase Commit Failure
View Full Document