BASE: Using Abstraction to Improve Fault ToleranceRodrigo Rodriguesy, Miguel Castrox, and Barbara LiskovyyMIT Laboratory for Computer Science200 Technology Sq., Cambridge MA 02139, USAxMicrosoft Research Ltd.1 Guildhall St., Cambridge CB2 3NH, [email protected], [email protected], [email protected] errors are a ma jor cause of outages and they areincreasingly exploited in malicious attacks. Byzantine faulttolerance allows replicated systems to mask some softwareerrors but it is exp ensive to deploy. This pap er describ es areplication technique, BASE, which uses abstraction to re-duce the cost of Byzantine fault tolerance and to improve itsability to mask software errors. BASE reduces cost b ecauseit enables reuse of o-the-shelf service implementations. Itimproves availability b ecause each replica can be repairedperio dically using an abstract view of the state stored bycorrect replicas, and b ecause each replica can run distinctor non-deterministic service implementations, which reducesthe probability of common mo de failures. We built an NFSservice where each replica can run a dierent o-the-shelfle system implementation, and an ob ject-oriented databasewhere the replicas ran the same, non-deterministic imple-mentation. These examples suggest that our technique canbe used in practice | in b oth cases, the implementation re-quired only a mo dest amount of new code, and our p erfor-mance results indicate that the replicated services performcomparably to the implementations that they reuse.1. INTRODUCTIONThere is a growing demand for highly-available systemsthat provide correct service without interruptions. Thesesystems must tolerate software errors because these are ama jor cause of outages [13]. Furthermore, there is an in-creasing numb er of malicious attacks that exploit softwareerrors to gain control or deny access to systems that provideimportant services.This pap er prop oses a replication technique, BASE, thatcombines Byzantine fault tolerance [31] with work on dataThis researchwas partially supported byDARPA under con-tract F30602-98-1-0237 monitored bytheAirForce ResearchLaboratory. Ro drigo Rodrigues was partially supported bya Praxis XXI fellowship.abstraction [20]. Byzantine fault tolerance allows a repli-cated service to tolerate arbitrary b ehavior from faulty repli-cas, e.g., behavior caused bya software bug or an attack.Abstraction hides implementation details to enable the reuseof o-the-shelf implementations of imp ortant services (e.g.,le systems, databases, or HTTP daemons) and to improvethe abilitytomasksoftware errors.We extended the BFT library [7, 8] to implement BASE.(BASE is an acronym for BFT with Abstract Sp ecicationEncapsulation.) The original BFT library provides Byzan-tine fault tolerance with go o d performance and strong cor-rectness guarantees if no more than 1=3 of the replicas failwithin a small window of vulnerability. However, it requiresall replicas to run the same service implementation and toupdate their state in a deterministic way. Therefore, itcannot tolerate deterministic software errors that cause allreplicas to fail concurrently and it complicates reuse of ex-isting service implementations b ecause it requires extensivemodications to ensure identical values for the state of eachreplica.The BASE library and metho dology describ ed in this pa-per correct these problems | they enable replicas to run dif-ferent or non-deterministic implementations. The method-ology is based on the concepts ofabstract specicationandabstraction functionfrom work on data abstraction [20]. Westart by dening a commonabstract specicationfor the ser-vice, which species anabstract stateand describ es how eachoperation manipulates the state. Then we implementacon-formancewrapperfor each distinct implementation to makeit b ehave according to the common sp ecication. The laststep is to implementanabstraction function(and one of itsinverses) to map from the concrete state of each implemen-tation to the common abstract state (and vice versa).The metho dology oers several imp ortant advantages.Reuse of existing co de.BASE implements a form of statemachine replication [17, 35], which allows replication of ser-vices that p erform arbitrary computations, but requires de-terminism: all replicas must produce the same sequence ofresults when they pro cess the same sequence of op erations.Most o-the-shelf implementations of services fail to satisfythis condition. For example, many implementations pro-duce timestamps by reading lo cal clocks, whichcancausethe states of replicas to diverge. The conformance wrapp erand the abstract state conversions enable the reuse of exist-ing implementations without mo dications. Furthermore,these implementations can b e non-deterministic, whichre-duces the probability of common mode failures.Software Rejuvenation through proactive recovery.It has b een observed [16] that there is a correlation b etweenthe length of time software runs and the probability thatit fails. BASE combines proactive recovery [8] with ab-straction to counter this problem. Replicas are recoveredperio dically even if there is no reason to susp ect they arefaulty. Recoveries are staggered so that the service remainsavailable during rejuvenation to enable frequent recoveries.When a replica is recovered, it is reb o oted and restartedfrom a clean state. Then it is broughtuptodateusingacorrect copy of the abstract state that is obtained from thegroup of replicas. Abstraction mayimproveavailabilitybyhiding corrupt concrete states, and it enables proactivere-covery when replicas do not run the same co de or run codethat is non-deterministic.Opportunistic N-version programming.Replication isnot useful when there is a strong p ositive correlation be-tween the failure probabilities of the dierent replicas, e.g.,deterministic software bugs cause all replicas to fail at thesame time when they run the same co de. N-version pro-gramming [9] exploits design diversity to reduce the proba-bility of correlated failures, but it has several problems [13]:it increases development and maintenance costs by a factorof N or more, adds unacceptable time delays to the im-plementation, and do es not provide a mechanism to repairfaulty replicas.BASE enables an opp ortunistic form of N-version pro-gramming byallowing us to takeadvantage of distinct, o-the-shelf implementations of common services. This
View Full Document