MIT 6 852 - Crash failures can drive Protocols to Arbitrary States - D2898576

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 852> Crash failures can drive Protocols to Arbitrary States

MIT 6 852 - Crash failures can drive Protocols to Arbitrary States

School name Massachusetts Institute of Technology

Course 6 852- Distributed Algorithms

Pages 10

Download Save

Unformatted text preview:

Crash failures can drive Protocols to Arbitrary StatesMahesh Jayaram*George VargheseDepartment of Computer ScienceWashington UniversitySt. Louis, MO 63130AbstractA crashing network protocol is an asynchronous pro-tocol that has no non-volatile memory at nodes thatcan survive a node crash and restart. Thus after acrash and restart, a node in such a protocol returns toa prespecified start state. We consider crashing proto-cols that work with links that can drop packets. Ourmain theorem states that such crashing protocols canbe driven by a sequence of crashes to any global state,where each node is in a state reached in some (possiblydifferent) run, and each link has an arbitrary mixtureof packets sent in (possibly different) runs.Our theorem can be used to give an alternate proofof an earlier result, due to Fekete et al, which statesthat there is no correct crashing Data Link Protocol.Our theorem can also be used to derivenew results.We prove that there is no correct crashing token pass-ing protocol. We also prove that there is no correctcrashing protocol for many other resource allocationprotocols such as k-exclusion, and the drinking anddining philosophers problems. Our theorem shows thatexisting crsshing network protocols (that are widelydeployed) are either incorrect or are self-stabilizing.Supported by NSF Grant NCR-9405444Pamieeion te meke digiWherd copies ofdl er pert of lhh meteriel lbr~We*mueeiegrentcd witboutfeeprOvided tbettbecopkearcaotrmde ordtibutedfer rofborcommem”ri8hotice*thetitl eoftbepubLIieaead itedez&*sL-J-@Oli diet WI@@ u by IwmieeioaoftbeACM, kc. To G~y dbmviee,terepublidt, topouen eerverewmtiiwti, requ~specitlcpamieeioa endhr fee.PODC’%, Philadelphia PA, USA01996 ACM O-89791 -SOWJ9W05..$3.5O1 IntroductionWe consider asynchronous network protocols thatwork with faulty components: links that can lose pack-ets, and nodes that can crssh and restart. Many com-mon network protocols (e.g., HDLC, 1P, the 0S1 andDECNET Routing protocols [Tan81]) come under thiscategory. 1 Crash failures, where a node crashes in themiddle of a protocol, are a common cause of protocolfailures. Frequently, protocols that seem to work foryears are driven into deadlocked states by a series ofcrashes.Many of these protocol specifications do not requirethe nodes to have non-volatile memory. Thus if a nodecrashes and restarts, it can lose all memory of its pre-vious state. After a restart, a node comes up in aninitial state in which all protocol variables are set tc)prespecified initial values. This may seem strange tc)the reader because non-volatile memory (e.g., disk) ischeap and provides protection against crash failures.However, many early protocol implementations wereon stand-alone devices (e.g., routers) that did not havea disk. Adding a disk was precluded by the expense,and sometimes by the physical configuration of the de-vice. Thus many network protocols like 1P and HDLCdo notrequire that nodes have non-volatile memory(NVM). Recently, cheaper electronic forms of NVM(e.g., NVRAM) have become available. Nevertheless,results about protocols that do not require NVM areint cresting, because many existing protocols are in thiscategory.1some ~outin~ protocolsdepend on time bounds and thus ar~!not strictly asynchronous. However, they have subcomponentsthat do not depend on time bounds for correctness, and hencecan be considered asynchronous.247Baratz and Segall [BS88] showed that the widelydeployed Data Link protocol HDLC could work incor-rectly if nodes did not keep NVM. Later, Fekete etal[FLMS93] showed that no Data Link protocol couldwork correct 1y under these assumptions.Attiya etal[ADW95] have proved related impossibility resultsfor Transport protocols. In this paper, we investigatethe power of crash failures for protocols other than justData Link or Transport protocols. Our main theoremcan be used to give an alternate proof of the Data Linkresult in [FLMS93]. It can also be used to show newresults: for inst ante, the impossibility y of token passingor resource allocation with crash failures and no NVM.Our results indicate that the combination of asyn-chrony, unlimited link storage, crash failures and noNVM is particularly deadly: a sequence of crashescan drive a protocol into (essentially) arbitrary globalstates. On the other hand, it is well-known that proto-cols can be made resilient to node crashes with the useof NVM. Baratz and Segall[BS88] describe a reliableData Link protocol that uses a single bit of NVM ateach node. Other papers ([Fin79, AAG87] show howto make any network protocol resilient to crash failuresusing a reliable Data Link protocol. Given the cheap-ness and availability y of small amounts of NVRAM, ourresults indicate that NVRAM is a cheap form of insur-ance for protocols.Protocols can also deal with arbitrary states by be-ing self-stabilizing, so they can recover from an ar-bitrary state. Afek and Brown [AB93] describe self-stabilizing Data Link and Token Passing protocols thatwork over an unreliable link. However, self-stabilizingprotocols are allowed to exhibit bad behavior beforeconverging to a good state. These results do not con-tradict our main theorem because our correctness cri-teria disallows incorrect behavior at any time.Our paper characterizes thefault span of crash fail-ures in a particular network model. The fault span isthe set of global states that a set of faults can drivea system into. The fault-span defines the power ofcrash failures — the larger the fault-span, the moredangerous the effect of crash failures. Our results indi-cate that the fault-span of crash failures (in a networkmod..l that applies to many practical settings) is verylarge. Knowledge of the fault-span can help a pro-tocol designer: the designer must deal with all statesin the fault-span. In particular, if the fault-span in-cludes all combinations of node and link states, theprotocol must essentially be self-stabilizing. We sug-gest that investigating fault-spans for other faults andother models is a useful direction.The rest of this paper is organized as follows. Wedescribe our results intuitively in the next section. Ourformal treatment begins with a model and some usefulnotation in Section 3 and culminates in the statementof the main theorem in Section 4. We describe applica-tions of our theorem in Section 5, describe extensionsto other models in Section 6 and’ state our conclusionsin Section 7.2Intuition behind main theoremIn what follows, we do not distinguish

View Full Document


School:
Email:
New Password:
Confirm Password:

MIT 6 852 - Crash failures can drive Protocols to Arbitrary States

Sign up for free to view:

Please select your school