DOC PREVIEW
UT CS 395T - Using Abstraction to Improve Fault Tolerance

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

BASE: Using Abstraction to Improve Fault ToleranceRodrigo Rodriguesy, Miguel Castrox, and Barbara LiskovyyMIT Laboratory for Computer Science200 Technology Sq., Cambridge MA 02139, USAxMicrosoft Research Ltd.1 Guildhall St., Cambridge CB2 3NH, [email protected], [email protected], [email protected] errors are a ma jor cause of outages and they areincreasingly exploited in malicious attacks. Byzantine faulttolerance allows replicated systems to mask some softwareerrors but it is exp ensive to deploy. This pap er describ es areplication technique, BASE, which uses abstraction to re-duce the cost of Byzantine fault tolerance and to improve itsability to mask software errors. BASE reduces cost b ecauseit enables reuse of o-the-shelf service implementations. Itimproves availability b ecause each replica can be repairedperio dically using an abstract view of the state stored bycorrect replicas, and b ecause each replica can run distinctor non-deterministic service implementations, which reducesthe probability of common mo de failures. We built an NFSservice where each replica can run a dierent o-the-shelfle system implementation, and an ob ject-oriented databasewhere the replicas ran the same, non-deterministic imple-mentation. These examples suggest that our technique canbe used in practice | in b oth cases, the implementation re-quired only a mo dest amount of new code, and our p erfor-mance results indicate that the replicated services performcomparably to the implementations that they reuse.1. INTRODUCTIONThere is a growing demand for highly-available systemsthat provide correct service without interruptions. Thesesystems must tolerate software errors because these are ama jor cause of outages [13]. Furthermore, there is an in-creasing numb er of malicious attacks that exploit softwareerrors to gain control or deny access to systems that provideimportant services.This pap er prop oses a replication technique, BASE, thatcombines Byzantine fault tolerance [31] with work on dataThis researchwas partially supported byDARPA under con-tract F30602-98-1-0237 monitored bytheAirForce ResearchLaboratory. Ro drigo Rodrigues was partially supported bya Praxis XXI fellowship.abstraction [20]. Byzantine fault tolerance allows a repli-cated service to tolerate arbitrary b ehavior from faulty repli-cas, e.g., behavior caused bya software bug or an attack.Abstraction hides implementation details to enable the reuseof o-the-shelf implementations of imp ortant services (e.g.,le systems, databases, or HTTP daemons) and to improvethe abilitytomasksoftware errors.We extended the BFT library [7, 8] to implement BASE.(BASE is an acronym for BFT with Abstract Sp ecicationEncapsulation.) The original BFT library provides Byzan-tine fault tolerance with go o d performance and strong cor-rectness guarantees if no more than 1=3 of the replicas failwithin a small window of vulnerability. However, it requiresall replicas to run the same service implementation and toupdate their state in a deterministic way. Therefore, itcannot tolerate deterministic software errors that cause allreplicas to fail concurrently and it complicates reuse of ex-isting service implementations b ecause it requires extensivemodications to ensure identical values for the state of eachreplica.The BASE library and metho dology describ ed in this pa-per correct these problems | they enable replicas to run dif-ferent or non-deterministic implementations. The method-ology is based on the concepts ofabstract specicationandabstraction functionfrom work on data abstraction [20]. Westart by dening a commonabstract specicationfor the ser-vice, which species anabstract stateand describ es how eachoperation manipulates the state. Then we implementacon-formancewrapperfor each distinct implementation to makeit b ehave according to the common sp ecication. The laststep is to implementanabstraction function(and one of itsinverses) to map from the concrete state of each implemen-tation to the common abstract state (and vice versa).The metho dology oers several imp ortant advantages.Reuse of existing co de.BASE implements a form of statemachine replication [17, 35], which allows replication of ser-vices that p erform arbitrary computations, but requires de-terminism: all replicas must produce the same sequence ofresults when they pro cess the same sequence of op erations.Most o-the-shelf implementations of services fail to satisfythis condition. For example, many implementations pro-duce timestamps by reading lo cal clocks, whichcancausethe states of replicas to diverge. The conformance wrapp erand the abstract state conversions enable the reuse of exist-ing implementations without mo dications. Furthermore,these implementations can b e non-deterministic, whichre-duces the probability of common mode failures.Software Rejuvenation through proactive recovery.It has b een observed [16] that there is a correlation b etweenthe length of time software runs and the probability thatit fails. BASE combines proactive recovery [8] with ab-straction to counter this problem. Replicas are recoveredperio dically even if there is no reason to susp ect they arefaulty. Recoveries are staggered so that the service remainsavailable during rejuvenation to enable frequent recoveries.When a replica is recovered, it is reb o oted and restartedfrom a clean state. Then it is broughtuptodateusingacorrect copy of the abstract state that is obtained from thegroup of replicas. Abstraction mayimproveavailabilitybyhiding corrupt concrete states, and it enables proactivere-covery when replicas do not run the same co de or run codethat is non-deterministic.Opportunistic N-version programming.Replication isnot useful when there is a strong p ositive correlation be-tween the failure probabilities of the dierent replicas, e.g.,deterministic software bugs cause all replicas to fail at thesame time when they run the same co de. N-version pro-gramming [9] exploits design diversity to reduce the proba-bility of correlated failures, but it has several problems [13]:it increases development and maintenance costs by a factorof N or more, adds unacceptable time delays to the im-plementation, and do es not provide a mechanism to repairfaulty replicas.BASE enables an opp ortunistic form of N-version pro-gramming byallowing us to takeadvantage of distinct, o-the-shelf implementations of common services. This


View Full Document

UT CS 395T - Using Abstraction to Improve Fault Tolerance

Documents in this Course
TERRA

TERRA

23 pages

OpenCL

OpenCL

15 pages

Byzantine

Byzantine

32 pages

Load more
Download Using Abstraction to Improve Fault Tolerance
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Using Abstraction to Improve Fault Tolerance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Using Abstraction to Improve Fault Tolerance 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?