DOC PREVIEW
UW-Madison CS 739 - Practical Byzantine Fault Tolerance and Proactive Recovery

This preview shows page 1-2-3-4-30-31-32-33-34-61-62-63-64 out of 64 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Practical Byzantine Fault Toleranceand Proactive RecoveryMIGUEL CASTROMicrosoft ResearchandBARBARA LISKOVMIT Laboratory for Computer ScienceOur growing reliance on online services accessible on the Internet demands highly available sys-tems that provide correct service without interruptions. Software bugs, operator mistakes, andmalicious attacks are a major cause of service interruptions and they can cause arbitrary behav-ior, that is, Byzantine faults. This article describes a new replication algorithm, BFT, that can beused to build highly available systems that tolerate Byzantine faults. BFT can be used in practiceto implement real services: it performs well, it is safe in asynchronous environments such as theInternet, it incorporates mechanisms to defend against Byzantine-faulty clients, and it recoversreplicas proactively. The recovery mechanism allows the algorithm to tolerate any number of faultsover the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a smallwindow of vulnerability. BFT has been implemented as a generic program library with a simpleinterface. We used the library to implement the first Byzantine-fault-tolerant NFS file system,BFS. The BFT library and BFS perform well because the library incorporates several importantoptimizations, the most important of which is the use of symmetric cryptography to authenticatemessages. The performance results show that BFS performs 2% faster to 24% slower than produc-tion implementations of the NFS protocol that are not replicated. This supports our claim that theBFT library can be used to build practical systems that tolerate Byzantine faults.Categories and Subject Descriptors: C.2.0 [Computer-Communication Networks]: General—Security and protection; C.2.4 [Computer-Communication Networks]: Distributed Systems—Client/server; D.4.3 [Operating Systems]: File Systems Management; D.4.5 [Operating Sys-tems]: Reliability—Fault tolerance; D.4.6 [Operating Systems]: Security and Protection—Access controls; authentication; cryptographic controls; D.4.8 [Operating Systems]: Perfor-mance—MeasurementsGeneral Terms: Security, Reliability, Algorithms, Performance, MeasurementAdditional Key Words and Phrases: Byzantine fault tolerance, state machine replication, proactiverecovery, asynchronous systems, state transferThis research was partially supported by DARPA under contract F30602-98-1-0237 monitored bythe Air Force Research Laboratory. Part of this work was done while M. Castro was with the MITLaboratory for Computer Science and during this time he was partially supported by Praxis XXIand Gulbenkian fellowships.Authors’ addresses: M. Castro, Microsoft Research, 7 J. J. Thomson Avenue, Cambridge CB3 0FB,UK; email: [email protected]; B. Liskov, MIT Laboratory for Computer Science, 545 Technol-ogy Square, Cambridge, MA 02139.Permission to make digital/hard copy of part or all of this work for personal or classroom use isgranted without fee provided that the copies are not made or distributed for profit or commercialadvantage, the copyright notice, the title of the publication, and its date appear, and notice is giventhat copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers,or to redistribute to lists, requires prior specific permission and/or a fee.C°2002 ACM 0734-2071/02/1100-0398 $5.00ACM Transactions on Computer Systems, Vol. 20, No. 4, November 2002, Pages 398–461.Practical Byzantine Fault Tolerance and Proactive Recovery•3991. INTRODUCTIONWe are increasingly dependent on services provided by computer systems andour vulnerability to computer failures is growing as a result. We would likethese systems to be highly available: they should work correctly and they shouldprovide service without interruptions.There is a large body of research on replication techniques to implementhighly available systems. The problem is that most research on replicationhas focused on techniques that tolerate benign faults (e.g., Alsberg and Day[1976], Gifford [1979], Oki and Liskov [1988], Lamport [1989], and Liskov et al.[1991]): these techniques assume components fail by stopping or by omittingsome steps. They may not provide correct service if a single faulty componentviolates this assumption. Unfortunately, this assumption is not valid becausemalicious attacks, operator mistakes, and software errors are common causesof failure and they can cause faulty nodes to exhibit arbitrary behavior, that is,Byzantine faults. The growing reliance of industry and government on computersystems provides the motif for malicious attacks and the increased connectivityto the Internet exposes these systems to more attacks. Operator mistakes arealso cited as one of the main causes of failure [Murphy and Levidow 2000]. Inaddition, the number of software errors is increasing due to the growth in sizeand complexity of software.Techniques that tolerate Byzantine faults [Pease et al. 1980; Lamport et al.1982] provide a potential solution to this problem because they make no as-sumptions about the behavior of faulty processes. There is a significant body ofwork on agreement and replication techniques that tolerate Byzantine faults.However, most earlier work (e.g., Canetti and Rabin [1992], Reiter [1996],Malkhi and Reiter [1996b], Garay and Moses [1998], and Khilstrom et al.[1998]) either concerns techniques that are too inefficient to be used in prac-tice, or relies on assumptions that can be invalidated easily by an attacker.For example, it is dangerous to rely on synchrony [Lamport 1984] for safety inthe Internet, that is, to rely on bounds on message delays and process speeds.An attacker may compromise the correctness of a service by delaying nonfaultynodes or the communication between them until the bounds are exceeded. Sucha denial-of-service attack is generally easier than gaining control over a non-faulty node.This article describes BFT, a new algorithm for state machine replica-tion [Lamport 1978; Schneider 1990] that offers both liveness and safety pro-vided at most b(n −1)/3c out of a total of n replicas are faulty. This means thatclients eventually receive replies to their requests and those replies are correctaccording to linearizability [Herlihy and Wing 1987; Castro and Liskov 1999a].BFT is the first Byzantine-fault-tolerant, state machine replication algo-rithm that is safe in asynchronous systems such as the Internet: it does notrely on any synchrony assumption to


View Full Document

UW-Madison CS 739 - Practical Byzantine Fault Tolerance and Proactive Recovery

Download Practical Byzantine Fault Tolerance and Proactive Recovery
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Practical Byzantine Fault Tolerance and Proactive Recovery and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Practical Byzantine Fault Tolerance and Proactive Recovery 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?