Reliable Distributed SystemsTransactionsSlide 3Transactions in the real worldRecall transactional modelA side remarkTransaction and Data ManagersTypical transactional programWhat about the locks?Locking ruleExamples of lock coverageTransactional Execution LogObservationsSerializabilityNeed for serializable executionNon serializable executionSerializable executionAtomicity considerationsHow can data manager sort out the operations?Components of transactional systemTransactions at a “single” databaseTwo-phase locking: how it worksWhy do we call it “two phase”?Two-phase LockingNotesWhy does 2PL imply serializability?Acyclic conflict graph implies serializabilityTwo-phase locking is “pessimistic”Contrast: Timestamped approachExample of when we abortPros and cons of approachesIntentions list conceptRole of write-ahead logStructure of a transactional systemRecovery?Transactions in distributed systemsSlide 37Slide 38Two-phase commit in transactionsCommit protocol illustratedSlide 41Unilateral abortNotes on 2PCReliable Distributed SystemsTransactionsTransactionsThe most important reliability technology for client-server systemsLast time we saw the basic ideaNow start an in-depth examination of the topicHow transactional systems really workImplementation considerationsLimitations and performance challengesScalability of transactional systemsSome topics covered in future lecturesTransactionsThere are several perspectives on how to achieve reliabilityOne approach focuses on reliability of communication channels and leaves application-oriented issues to the client or server – “stateless”Major alternative is to focus on the data managed by a system. Stateful version yields transactional systemA third option exploits non-transactional replication. We’ll look at it laterTransactions in the real worldIn this room, transactions are treated at the same level as other techniquesBut in the real world, transactions represent 90% of the existing market for reliable distributed systems!! But keep in mind that 90% is not 100%The web is gradually starting to shift the balance (not by reducing the size of the transaction market but by growing so fast that it is catching up)But even on the web, we use transactions when we buy productsRecall transactional modelApplications are coded in a stylized way:begin transactionPerform a series of read, update operationsTerminate by commit or abort. TerminologyThe application is the transaction managerThe transaction is the sequence of operations issued by the transaction manager while it runsThe data manager is presented with operations from concurrently active transactionsIt schedules them in an interleaved but serializable orderA side remarkEach transaction is built up incrementallyApplication runsAnd as it runs, it issues operationsThe data manager sees them one by oneBut often we talk as if we knew the whole thing at one timeWe’re careful to do this in ways that make senseIn any case, we usually don’t need to say anything until a “commit” is issuedTransaction and Data ManagersTransactionsreadupdatereadupdatetransactions are stateful: transaction “knows” about database contents and updatesData (and Lock) ManagersTypical transactional programbegin transaction; x = read(“x-values”, ....); y = read(“y-values”, ....); z = x+y; write(“z-values”, z, ....);commit transaction;What about the locks?Unlike other kinds of distributed systems, transactional systems lock the data they accessThey obtain these locks as they run:Before accessing “x” get a lock on “x”Usually we assume that the application knows enough to get the right kind of lock. It is not good to get a read lock if you’ll later need to update the objectIn clever applications, one lock will often cover many objectsLocking ruleSuppose that transaction T will access object x.We need to know that first, T gets a lock that “covers” xWhat does coverage entail?We need to know that if any other transaction T’ tries to access x it will attempt to get the same lockExamples of lock coverageWe could have one lock per object… or one lock for the whole database… or one lock for a category of objects In a tree, we could have one lock for the whole tree associated with the rootIn a table we could have one lock for row, or one for each column, or one for the whole tableAll transactions must use the same rules!And if you will update the object, the lock must be a “write” lock, not a “read” lockTransactional Execution LogAs the transaction runs, it creates a history of its actions. Suppose we were to write down the sequence of operations it performs.Data manager does this, one by oneThis yields a “schedule” Operations and order they executedCan infer order in which transactions ranScheduling is called “concurrency control”ObservationsProgram runs “by itself”, doesn’t talk to othersAll the work is done in one program, in straight-line fashion. If an application requires running several programs, like a C compilation, it would run as several separate transactions!The persistent data is maintained in files or database relations external to the applicationSerializabilityMeans that effect of the interleaved execution is indistinguishable from some possible serial execution of the committed transactionsFor example: T1 and T2 are interleaved but it “looks like” T2 ran before T1Idea is that transactions can be coded to be correct if run in isolation, and yet will run correctly when executed concurrently (and hence gain a speedup)Need for serializable executionT1: begin read(x) read(y) write(x) commitT2: begin read(x) write(x) write(y) commitDB: read(x) read(x) write(x) write(y) read(y) write(x)Data manager interleaves operations to improve concurrencyNon serializable executionT1: begin read(x) read(y) write(x) commitT2: begin read(x) write(x) write(y) commitDB: read(x) read(x) write(x) write(y) read(y) write(x)Problem: transactions may “interfere”. Here, T2 changes x, hence T1 should have either run first (read and write) or after (reading the changed value). Unsafe! Not serializableSerializable executionT1: begin read(x) read(y) write(x) commitT2: begin read(x)
View Full Document