MIT 6 897 - Optimal Resilience with Byzantine Shared Memory

Unformatted text preview:

IntroductionThe System ModelRegistersExemplifying the ResultsPrevious Solutions and Remaining ChallengesIntuitive Description of Our Algorithmst-Tolerant FW-Terminating Regular Register EmulationRegister EmulationCorrectnessEfficiencyt-Tolerant Wait-free Safe Register EmulationRegister EmulationCorrectnessRound ComplexityWait-free Consensus with FW-Terminating Registers and Lower Bounds on Register EmulationsLower Bound on write EmulationsLower Bound on read EmulationsAllowing Readers to Modify ObjectsConclusionsByzantine Disk Paxos:Optimal Resilience with Byzantine Shared Memory∗Ittai Abraham†, Gregory Chockler‡, Idit Keidar§, Dahlia Malkhi†December 27, 2004AbstractWe present Byzantine Disk Paxos, an asynchronous shared-memory consensus algorithm thatuses a collection of n > 3t disks, t of which may fail by becoming non-responsive or arbitrarilycorrupted. We give two constructions of this algorithm; that is, we construct two different t-tolerant building blocks, each of which can be used, along with a leader oracle, to solve consensus.One building block is a t-tolerant wait-free shared safe register. The second building block isa t-tolerant regular register that satisfies a weaker termination (liveness) condition than waitfreedom: its write operations are wait-free, whereas its read operations are guaranteed to returnonly in executions with a finite number of writes. We call this termination condition finite writes(FW), and show that t-tolerant wait-free consensus is solvable with FW-terminating registersand a leader oracle. We construct each of these t-tolerant registers from n > 3t base registers, tof which can be non-responsive or Byzantine. All the previous t-tolerant wait-free constructionsin this model used at least 4t + 1 fault-prone registers, and we are not familiar with any priorFW-terminating constructions in this model.We further show tight lower bounds on the number of invocation rounds required for optimalresilience reliable register constructions, or more generally, constructions that use less than4t + 1 fault-prone registers. Our lower bounds show that such constructions are inherently morecostly than constructions that use 4t + 1 registers, and that our constructions have optimalround complexity. Furthermore, our wait-free construction is early-stopping, and it achieves theoptimal round complexity with any number of actual failures.Keywords: shared-memory emulations, t-tolerant object implementations, Byzantine failures,wait freedom, consensus, lower bounds.†School of Computer Science and Engineering, The Hebrew University of Jerusalem.Email: {ittaia,dalia}@cs.huji.ac.il.‡Lab for Computer Science and Artificial Intelligence. Massachusetts Institute of Technology.Email: [email protected].§Department of Electrical Engineering, The Technion – Israel Institute of Technology.Email: [email protected].∗A preliminary version of this paper, by the same authors and with the same title, appears in Proceedings of the23rd ACM Symposium on Principles of Distributed Computing (PODC ’04), July 2004, pages 226–235.1 IntroductionWe consider an asynchronous system with multiple processes accessing fault-prone shared mem-ory objects [AMT93, AGM+95, JCT98]. We study implementations of reliable objects from baseobjects that may fail by being non-responsive [AMT93, JCT98] or by returning arbitrary val-ues [AGM+95, JCT98] (i.e., by being Byzantine); this failure model is called non-responsive ar-bitrary (NR-Arbitrary) faults [JCT98]. We focus on t-tolerant implementations [JCT98], that is,implementations that are correct as long as up to t base objects fail. In addition to memory failures,we assume that any number of the processes accessing the shared objects may crash.This model is important in capturing a fair amount of recent work on Byzantine fault-tolerant“data centric” replication, which arises in three different application domains: (i) message-passingclient-server systems in which servers store information on behalf of clients and the only communi-cation is between clients and servers, allowing servers to be modeled as storage components, e.g.,Fleet [MR00], SBQ-L [MAD02], Agile Store [LAV03], Coca [ZSvR02], and [Baz00]; (ii) Byzantinefault-tolerant peer-to-peer storage systems, e.g., Rosebud [RL04] and [LQLZ04]; and (iii) storagearea networks, e.g., PASIS [GWGR04]. Since in these settings processes access the shared memoryover a network, such systems typically try to minimize the number of memory access rounds.Our goal is to enhance this fruitful line of work into a survivable distributed storage systemthat tolerates arbitrary corruption and unresponsiveness (i.e., NR-Arbitrary faults) in up to a thirdof its disks (or servers) as well as process crashes. (Tolerating NR-Arbitrary faults in a third ormore of the disks is impossible [MAD02]). Although a number of projects have set out to tacklethis problem (e.g., E-vault [GGJR00], Fleet [MR00], Agile Store [LAV03], SBQ-L [MAD02], Coca[ZSvR02], PASIS [GWGR04], as well as [ABO03] and [Baz00]), to date, this goal has not beenachieved in our setting.Consensus is a fundamental building block that may be used to realize such distributed storagesystems. For solving consensus with shared disks, we turn to shared memory failure-detectorbased consensus algorithms [LH94, GL03], and in particular, the shared-memory version of DiskPaxos [GL03], which employs shared wait-free single-writer multi-reader (SWMR) regular registers1.Thus, the problem of solving consensus (assuming a leader oracle) can be reduced to implementinga wait-free SWMR regular register. When disks are subject to unavailability faults only, such aregister is easily implemented from a collection of 2t + 1 fail-prone registers, each stored on onedisk, by reading and writing from/to a majority of disks [GL03].Coping with NR-Arbitrary faults is more challenging. Since the introduction of this fail-ure model, researchers have constructed t-tolerant wait-free shared registers using 4t + 1 [MR98,GWGR04], 5t + 1 [JCT98], or even 6t + 1 [CMR01] fault-prone base objects. Several works haveachieved better resilience by weakening the model in different ways – by adding synchrony [Baz00];by storing signed self-verifying data [MR98, MAD02]; or by providing solutions that may blockindefinitely if processes fail [MAD02, ABO03]. However, t < n/4 is the best resilience achievedthus far by t-tolerant wait-free constructions for the model considered


View Full Document

MIT 6 897 - Optimal Resilience with Byzantine Shared Memory

Download Optimal Resilience with Byzantine Shared Memory
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Optimal Resilience with Byzantine Shared Memory and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Optimal Resilience with Byzantine Shared Memory 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?