CORNELL CS 614 - NONBLOCKING COMMIT PROTOCOL - D2999044

Home> Schools> Cornell University> Computer Science (CS) > CS 614> NONBLOCKING COMMIT PROTOCOL

DOC PREVIEW

CORNELL CS 614 - NONBLOCKING COMMIT PROTOCOL

School name Cornell University

Course Cs 614- Advanced Systems

Pages 10

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

NONBLOCKING COMMIT PROTOCOLS* Dale Skeen Computer Science Division EFCS DeQartment University of California Berkeley, California "From a certain point onward there is no longer any turning back. That is the point that must be reached." - Kafka ABSTRACT -- Protocols that allow operational sites to continue transac- tion processing even though site failures have occurred are called nonblocking. Many applications require nonblocking Qrotocols. This paper investigates the properties of non- blocking protocols. Necessary and sufficient conditions for a protocol to be nonblocking are presented and from these conditions a method for designing them is derived. Both a central site nonblocking protocol and a decentralized non- blocking protocol are presented. 1 -- Introduction Recently, considerable research interest has been focused on distributed data 'base systems [LORI77,- ROTH77, SCHA78, SVOB791. Several systems have been oro- posed and are in various stages of imple- mentation, including SDD-1 [HAMM79], SYSTEM-R [LIND79], and Ingres [STON791. It. is widely recognized that distributed crash recovery is vital to the usefulness of these systems. However, resilient Qro- tocols are hard to design and they are expensive. Crash recovery algorithms are based on the notion that certain basic operations on the data are logically indi- visible. These operations are called transactions. * This research was sponsored by the U.S. Air Force Office of Scientific Research Grant 78-3596, the U.S. Army Research Office Grant DAAG29-76-G-0245, and the Naval Electronics Systems Command Contract N00039-78-G-0013. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 01981 ACM 0-89791-040-O /80/0400/0133 $00.75 Transaction Manaoement By definition, a transaction on a distributed data base system is an atomic operation: either it executes to comple-. tion or it appears never to have executed at all. However, a transaction is rarely a physically atomic operation, rather, dur- ing execution it must be decomposed into a seguence of physical operations. This discrepancy between logical atomicity (as seen by the application) and physical atomicity poses a significant problem in the implementation of distributed systems. This problem is amplified when transaction atomicity must be preserved across multi- ple failures. Nonetheless, most applica- tions require that a notion of transaction atomicity (above the level of physical atomicity) be supported and made resilient to failures. Preserving transaction atomicity in the single site case is a well understood problem [LIND79, GRAY791. The Qrocessing of a single transaction is viewed as fol- lows. At some time during its execution, a commit point is reached-where the site decidesto commit or to abort the transac- tion. A commit is an unconditional guarantee to execute the transaction to completion, even in the event of multiple failures. Similarly, an abort is an unconditional guarantee to "back out" the transaction so that none of its results persist. If a failure occurs before the commit Qoint is reached, then immediately upon recovering the site will abort the transaction. Commit and abort are 133irreversible. See [LIND79] ,for a discus- There are several sion on implementing this abstraction of possible execution states of the transaction; two are of transaction management. interest. The problem of guaranteeing transac- tion atomicity is compounded when more than one site is involved. Given that each site has a local recovery strategy that provides atomicity at the local level, the problem becomes one of insuring that the sites, either unanimously abort or unani- mously commit. A mixed decision results in an inconsistent data base. Protocols for preserving transaction atomicity are called commit protocols. Several commit protocols have been pro- posed [ALSB76, HAMM79, LAMP76, LIND79, STON79] The simplest commit protocol that allows unilateral abort is the two phase commit protocol illustrated in figure 1 [GRAY79, LAMP761. This protocol uses a designated site (site 1 in the figure) to coordinate the execution of the transac- tion at the other. sites. In the first phase of the protocol the coordinator dis- tributes the transaction to all sites, and then each site individually votes on whether to commit (yes) or abort (no) it. In the second phase, the coordinator col- lects all the votes and informs each site of the outcome. In the absence of failures, this protocol preserves atomi- city. First, either of the failed sites may have aborted the transaction. Secondly, all sites may have decided to commit the protocol. In the latter situation, if the coordinator failed between sending commit messages and if the second site failed after receiving a commit message, then the transaction has been committed at the second site. Since site three has no way of determining the status of the transac- tion at the second site, it can not safely proceed. Instead, execution of the tran- saction must be blocked at site three until one of the failed sites has recovered. The two phase commit protocol is an example of a blocking protocol: opera- tional sites sometimes. wait on the recovery of a failed sites. Locks must be held on the database while the transaction is blocked. A protocol that never requires opera- tional sites to block until a failed site has recovered is called a nonblocking pro- toco1. Termination and Recovery Protocols Nonblocking Commit Protocols Consider

View Full Document