© 2011 A.W. Krings CS448/548 Survivable Systems and Networks, Sequence 10Fault Models Much research has focussed on fault models. This discussion is based on the paper:–Thambidurai, P., and You-Keun Park, "Interactive Consistency with Multiple Failure Modes”, Reliable Distributed Systems, Volume, Issue, 10-12 Oct 1988 Page(s):93 - 100.–There is an interesting follow-up paper "Verification of Hybrid Byzantine Agreement Under Link Faults" by P. Lincoln and J. Rushby that addresses a problem in the algorithm of Thambidurai and Park1© 2011 A.W. Krings CS448/548 Survivable Systems and Networks, Sequence 10Fault Models Benign versus Malicious–Benign»error is self-evident»component does not undergo incorrect state transition during failure»examples:omission faultcrash faulttiming faultdata out-of-bound2© 2011 A.W. Krings CS448/548 Survivable Systems and Networks, Sequence 10Fault Models –Malicious»not self-evident to all non faulty receivers»can behave in two ways»symmetricreceived identically by all processors»asymmetricno restrictions of fault => anything goes–Fault frequency»worse case every fault could behave asymmetric»best case all faults are benign»what is the best assumption for your system?3© 2011 A.W. Krings CS448/548 Survivable Systems and Networks, Sequence 10Fault Models Fault TaxonomyRelationship & Probability of OccurrenceFaultBenign MaliciousSymmetric AsymmetricAsymmetricSymmetricBenign4© 2011 A.W. Krings CS448/548 Survivable Systems and Networks, Sequence 10Fault Models Lamport Model–assumes that every fault is asymmetricMeyer + Pradhan 87–differentiates between malicious and benign faultsor5© 2011 A.W. Krings CS448/548 Survivable Systems and Networks, Sequence 10Fault Models Thambidurai + Park 88–difference between malicious faults»symmetric faults»asymmetric faults»result:»a = asym., s = sym., b = benign, r = rounds»in general amax < smax < bmax»or λa << λs << λb»saves rounds and hardware6© 2011 A.W. Krings CS448/548 Survivable Systems and Networks, Sequence 10Fault Models Advantages of multi-fault model–1) more accurate model of the system»less “overly conservative”–2) resulting reliabilities are better»custom tailor recovery mechanisms»Example:consider Byzantine solution using OM() algorithmassume N = 4, 5, 6still, only one fault is covered using the OM algorithmmoreover, the system reliability degrades –N = 6 results in worse reliability than N = 4–one is better off to turn the additional processors off!»see paper Tha88, page 98, table 17© 2011 A.W. Krings CS448/548 Survivable Systems and Networks, Sequence 10Fault Models Source: Tha888© 2011 A.W. Krings CS448/548 Survivable Systems and Networks, Sequence 10Fault Models Source: Tha889© 2011 A.W. Krings CS448/548 Survivable Systems and Networks, Sequence 10Fault Models –3) smarter degradation»we can specify the number of rounds»example using N = 11let subscript max denote the maximum number of faults covered, assuming this is the only type of fault occurring.if r = 1 then amax = 1 or smax = 4if r = 2 then amax = 2 or smax = 4 why? smax = 4 => N > 2 4 + 2 = 10 smax = 5 => N > 2 5 + 2 = 12–requirements for success»good estimate of fail rates λa , λs , λbtypically λa << λs << λb»good estimate of recovery rates ρa , ρs , ρbtypically ρa < ρs < ρb10© 2011 A.W. Krings CS448/548 Survivable Systems and Networks, Sequence 10Agreement algorithmsIncomplete Interconnections–Lam82, Dol82–agreement only if the number of processors is less than 1/2 of the connectivity of the system’s network.Eventual vs. Immediate Byz. Agreement (EBA,IBA) –recall interactive consistency conditions IC1, IC2–an agreement is immediate if in addition to IC1 and IC2 all correct processors also agree (during the round) on the round number at which they reach agreement.–otherwise the agreement is called eventual»each processor has decided on its value, but cannot synchronize its decision with that of the others until some later phase.»Thus, agreement may not always need full t+1 rounds11© 2011 A.W. Krings CS448/548 Survivable Systems and Networks, Sequence 10Agreement algorithmsLamport OMLamport SMDavis+WakerlyMeyer+PradhanThambidurai+Park Dol82a (EBA)12Fault Model Overview13All FaultsMaliciousBenignBenignOmissive SymmetricBenignAsymmetricTransmissive AsymmetricStrictly Omissive AsymmetricSymmetricTransmissive SymmetricAzadmanesh & KieckhaferTransmissive and Omissive versions of symmetric and asymmetric faults© 2011 A.W. Krings CS448/548 Survivable Systems and Networks, Sequence 10Agreement algorithmsSo, now that we know the very basics of agreement and fault models, what will work and what will not work for our domain, i.e.,
View Full Document