Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 18Fault Models Much work has been done on fault models. The discussion is based on the paper:–Thambidurai, P., and You-Keun Park, "Interactive Consistency with Multiple Failure Modes”, Reliable Distributed Systems, Volume, Issue, 10-12 Oct 1988 Page(s):93 - 100.–There is an interesting follow-up paper "Verification of Hybrid Byzantine Agreement Under Link Faults" by P. Lincoln and J. Rushby that addresses a problem in the algorithm of Thambidurai and Park1Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 18Fault Models Benign versus Malicious–Benign»error is self-evident»component does not undergo incorrect state transition during failure»examples:omission faultcrash faulttiming faultdata out-of-bound2Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 18Fault Models –Malicious»not self-evident to all non faulty receivers»can behave in two ways»symmetricreceived identically by all processors»asymmetricno restrictions of fault => anything goes–Fault frequency»worse case every fault could behave asymmetric»best case all faults are benign»what is the best assumption for your system?3Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 18Fault Models Fault TaxonomyRelationship & Probability of Occurrence–note: this is not a venn diagram!FaultBenign MaliciousSymmetric AsymmetricAsymmetricSymmetricBenign4Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 18Fault Models Lamport Model–assumes that every fault is asymmetricMeyer + Pradhan 87–differentiates between malicious and benign faultsor5Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 18Fault Models Thambidurai + Park 88–difference between malicious faults»symmetric faults»asymmetric faults»result:»a = asym., s = sym., b = benign, r = rounds»in general amax < smax < bmax»or λa << λs << λb»saves rounds and hardware6Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 18Fault Models Advantages of multi-fault model–1) more accurate model of the system»less “overly conservative”–2) resulting reliabilities are better»custom tailor recovery mechanisms»Example:consider Byzantine solution using OM() algorithmassume N = 4, 5, 6still, only one fault is covered using the OM algorithmmoreover, the system reliability degrades –N = 6 results in worse reliability than N = 4–one is better off to turn the additional processors off!»see paper Tha88, page 98, table 17Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 18Fault Models Source: Tha888Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 18Fault Models Source: Tha889Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 18Fault Models –3) smarter degradation»we can specify the number of rounds»example using N = 11let subscript max denote the maximum number of faults covered, assuming this is the only type of fault occurring.if r = 1 then amax = 1 or smax = 4if r = 2 then amax = 2 or smax = 4 why? smax = 4 => N > 2 4 + 2 = 10 smax = 5 => N > 2 5 + 2 = 12–requirements for success»good estimate of fail rates λa , λs , λbtypically λa << λs << λb»good estimate of recovery rates ρa , ρs , ρbtypically ρa < ρs < ρb10Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 18Agreement algorithmsAzadmanesh & Kieckhafer–partitions further into transmissive and omissive cases of malicious faults11All FaultsMalicious BenignBenignOmissive SymmetricBenignAsymmetricTransmissive AsymmetricStrictly Omissive AsymmetricSymmetricTransmissive SymmetricPage: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 18Agreement algorithmsIncomplete Interconnections–Lam82, Dol82–agreement only if the number of processors is less than 1/2 of the connectivity of the system’s network.Eventual vs. Immediate Byz. Agreement (EBA,IBA) –recall interactive consistency conditions IC1, IC2–an agreement is immediate if in addition to IC1 and IC2 all correct processors also agree (during the round) on the round number at which they reach agreement.–otherwise the agreement is called eventual»each processor has decided on its value, but cannot synchronize its decision with that of the others until some later phase.»Thus, agreement may not always need full t+1 rounds12Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 18Agreement algorithmsLamport OMLamport SMDavis+WakerlyMeyer+PradhanThambidurai+Park Dol82a
View Full Document