Unformatted text preview:

The Weakest Failure Detector for Wait-Free Dining underEventual Weak Exclusion∗Srikanth SastryComputer Science and EngrTexas A&M UniversityCollege Station, TX, [email protected] M. PikeComputer Science and EngrTexas A&M UniversityCollege Station, TX, [email protected] L. WelchComputer Science and EngrTexas A&M UniversityCollege Station, TX, [email protected] philosophers is a classic scheduling problem for localmutual exclusion on arbitrary conflict graphs. We establishnecessary conditions to solve wait-free dining under eventualweak exclusion in message-passing systems with crash faults.Wait-free dining ensures that every correct hungry processeventually eats. Eventual weak exclusion permits finitelymany scheduling mistakes, but eventually no live neighborseat simultaneously; this exclusion criterion models scenarioswhere scheduling mistakes are recoverable or only affect per-formance. Previous work showed that the eventually perfectfailure detector (3P) is sufficient to solve wait-free diningunder eventual weak exclusion; we prove that 3P is alsonecessary, and thus 3P is the weakest oracle to solve thisproblem. Our reduction also establishes that any such din-ing solution can be made eventually fair. Finally, the reduc-tion itself may be of more general interest; when applied towait-free perpetual weak exclusion, our reduction producesan alternative proof that the more powerful trusting oracle(T ) is necessary (but not sufficient) to solve the problem ofFault-Tolerant Mutual Exclusion (FTME).Categories and Subject DescriptorsC.2.4 [Computer-Communication Networks]: Distrib-uted Systems—distributed applications; network operatingsystems; D.4.5 [Operating Systems]: Reliability—faulttolerance; F.1.1 [Computation by Abstract Devices]:Models of Computation—relations between models; F.2.2[Analysis of Algorithms and Problem Complexity]:Nonnumerical Algorithms and Problems—sequencing andschedulingGeneral TermsAlgorithms, Reliability, Theory∗This work was supported in part by NSF grant 0500265 andby the Texas Higher Education Coordinating Board undergrants ARP 000512-0007-2006 and ARP 000512-0130-2007.Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SPAA’09, August 11–13, 2009, Calgary, Alberta, Canada.Copyright 2009 ACM 978-1-60558-606-9/09/08 ...$10.00.KeywordsDining Philosophers, Failure Detectors, Mutual Exclusion,Partial Synchrony, Wait-Freedom1. INTRODUCTIONWe examine the Dining Philosophers problem (or dining,for short), a classic problem in distributed scheduling. Firstproposed by Dijkstra for a ring topology in [5], dining waslater generalized by Lynch in [10] for local mutual exclusionon arbitrary conflict graphs. We explore a recent dimensionof this problem with the following primary result: wait-freedining under eventual weak exclusion [12] is equivalent tothe class of eventually perfect failure detectors (3P) fromthe Chandra-Toueg hierarchy [3].Subsequent sections will explain the foregoing terms ingreater detail. For now, we introduce these concepts onlyinformally. The variant of dining we consider guaranteesthat — in spite of potential process crashes — every correctprocess competing for exclusive access to its critical sectioneventually accesses its critical section (wait freedom), andthat eventually no two live neighbors are in their criticalsections simultaneously (eventual weak exclusion — 3WX ).The time for convergence to the exclusive suffix may varyfor each run and is not assumed to be known.A failure detector can be viewed as a distributed oraclethat can be queried for (potentially incorrect) informationabout process crashes [3]. Failure detectors can encapsulatetemporal uncertainties about the underlying system modelsin which they are implemented in terms of the (un)reliabilityof the suspect lists provided. The eventually perfect failuredetector – 3P – is one such failure detector. Informally,3P always suspects crashed processes, but only eventuallystops suspecting correct processes. Thus, 3P is allowed tomake mistakes by wrongfully suspecting correct processesfinitely many times during any run. As before, the time forconvergence to the perfect suffix may vary for each run and isnot assumed to be known. Despite such unreliability, 3P issufficiently powerful to solve many crash-tolerant problemsincluding consensus [3], stable leader election [1], and crash-locality-1 dining for perpetual exclusion [11].It is also known that 3P is sufficient to solve wait-freedining under 3WX [12, 13]. A natural follow-up question iswhether 3P is also necessary; that is, does 3P encapsulatetemporal assumptions equivalent to those encapsulated bya wait-free scheduling service for eventual weak exclusion insystems subject to crash faults? Our primary result answersthis question affirmatively: 3P is also necessary, and henceis equivalent to wait-free dining under 3WX .111We establish the necessity of 3P using the standard prooftechnique of reducing a candidate failure-detection oracleto the target problem itself [2]; that is, we construct anasynchronous transformation that extracts the oracle 3Pfrom an underlying black-box instance of wait-free dining foreventual weak exclusion (denoted hereafter as WF-3WX ).As such, the primary technical challenge is demonstratinghow the temporal uncertainty implied by an unreliable (buteventually perfect) oracle can be reduced to an unreliable(but eventually exclusive) scheduler. In conjunction withthe sufficiency results from [12, 13], our necessity reductionestablishes that 3P is, in fact, the weakest failure detectorto solve WF-3WX .A related paper on boosting obstruction-freedom [8] claimsthat 3P is the weakest failure detector to solve wait-freecontention management (a special case of wait-free 3WX ).Although the equivalence relation between 3P and wait-freecontention management happens to be true, the reductionused to establish the equivalence is actually flawed, as is therelated proof of correctness. Our paper reveals a subtle, butimportant, vulnerability in [8], and provides an alternatereduction to remedy the error.Our


View Full Document

MIT 6 852 - Study References

Download Study References
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study References and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study References 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?