From Total Order to Database Replication

Home> Academic Documents> From Total Order to Database Replication

DOC PREVIEW

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

From Total Order to Database ReplicationYair Amir and Ciprian TutuDepartment of Computer ScienceJohns Hopkins UniversityBaltimore, MD 21218, USA{yairamir, ciprian}@cnds.jhu.eduTechnical ReportCNDS-2001-6http://www.cnds.jhu.eduNovember 5, 2001AbstractThis paper presents in detail an efficient and provably correct algorithm for database replicationover partitionable networks. Our algorithm avoids the need for end-to-end acknowledgments foreach action while supporting network partitions and merges and allowing dynamic instantiationof new replicas. One round of end-to-end acknowledgments is required only upon a membershipchange event such as a network partition. New actions may be introduced to the system at anypoint, not only while in a primary component. We show how performance can be further improvedfor applications that allow relaxation of consistency requirements. We provide experimental resultsthat demonstrate the superiority of this approach.1 IntroductionDatabase replication is quickly becoming a critical tool for providing high availability, survivabilityand high performance for database applications. However, to provide useful replication one has tosolve the non-trivial problem of maintaining data consistency between all the replicas.The state machine approach [27] to database replication ensures that replicated databases thatstart consistent will remain consistent as long as they apply the same deterministic actions (trans-actions) in the same order. Thus, the database replication problem is reduced to the problem ofconstructing a global persistent consistent order of actions. This is often mistakenly considered easyto achieve using the Total Order service (e.g. ABCAST, Agreed order, etc) provided by group com-munication systems.Early models of group communication, such as Virtual Synchrony, did not support network parti-tions and merges. The only failures tolerated by these models were process crashes, without recovery.Under these circumstances, total order is sufficient to create global persistent consistent order.Unfortunately, almost no real-world system today adheres to the requirement of never havingnetwork partitions. Even in local area networks, network partitions occur regularly due to eitherhardware (e.g. temporarily disconnected switches) or software (heavily loaded servers). Of course, inwide area networks, partitions can be common [5].1When network partitions are possible, total order service does not directly translate to a globalpersistent consistent order. Existing solutions that provide active replication either avoid dealing withnetwork partitions [29, 24, 23] or require additional end-to-end acknowledgements for every actionafter it is delivered by the group communication and before it is admitted to the global consistentpersistent order (and can be applied to the database) [16, 12, 28].In this paper we present a complete and provably correct algorithm that provides global persistentconsistent order in a partitionable environment without the need for end-to-end acknowledgments ona per action basis. In our approach end-to-end acknowledgements are only used once for every networkconnectivity change event (such as network partition or merge) and not per action. The basic concept,though never published, was first introduced as part of a PhD dissertation [2]. This paper presentsour newly developed insight into the problem and goes beyond [2] by supporting online additions ofcompletely new replicas and complete removals of existing replicas while the system executes.Our algorithm does not require changes to existing databases to support replication. Instead itbuilds a generic replication engine which runs outside the database and can be seamlessly integratedwith existing databases and applications. The replication engine supports various semantic models,relaxing or enforcing the consistency constraints as needed by the application. We have implementedthe replication engine on top of the Spread toolkit [4, 3] and provide experimental performanceresults, comparing the throughput and latency of the global consistent persistent order using ouralgorithm, the COReL algorithm introduced in [16], and the standard two-phase commit algorithm.These results demonstrate the power of eliminating the end-to-end acknowledgments on a per-actionbasis.The rest of the paper is organized as follows. The following subsection discusses related work.Section 2 describes the working model. Section 3 introduces a conceptual solution. Section 4 addressesthe problems exhibited by the conceptual solution in a partitionable system and introduces theExtended Virtual Synchrony model as a tool to provide global persistent order. Section 5 describesthe detailed replication algorithm and extends it to support online removals and additions to the set ofparticipating replicas. Section 6 shows how the global persistent order guarantees of the algorithm canbe used to support various relaxed consistency requirements useful for database replication. Section7 evaluates the performance of our prototype, while Section 8 concludes the paper. Appendix Apresents the complete pseudo-code of the static replication algorithm.1.1 Related WorkTwo-phase commit protocols [12] remain the main technique used to provide a consistent view in adistributed replicated database system over an unreliable network. These protocols impose a sub-stantial communication cost on each transaction and may require the full connectivity of all replicasto recover from some fault scenarios. Three-phase-commit protocols [28, 17] overcome some of theavailability problems of two-phase-commit protocols, paying the price of an additional communicationround.Some protocols optimize for specific cases: limiting the transactional model to commutative trans-actions [26]; giving special weight to a specific processor or transaction [30]. Explicit use of timestampsenables other protocols [6] to avoid the need to claim locks or to enforce a global total order on ac-tions, while other solutions settle for relaxed consistency criteria [11]. Various people investigatedmethods to implement efficient lazy replication algorithms by using epidemic propagation [8, 14] orby exploiting application semantics [22].Atomic Broadcast [13] in the context of Virtual Synchrony [7] emerged as a promising tool to2solve the replication problem. Several algorithms were introduced [29, 24, 25, 23] to implementreplication solutions based on total ordering. All these


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Please select your school