DOC PREVIEW
MSU CSE 870 - A Theory of Fault-Tolerance

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11A Theory of Fault-Tolerance2Unifying Fault-Tolerance Approaches• Several disciplines with focus on different faults and specific architectures– Crash recovery– Atomic transactions – Fault-tolerance of digital systems– Fault-tolerance in message-passing systems• Verification of fault-tolerance– Application-specific– Verify recovery and safely terminate (mask the faults)– Less attention given to non-maskable faults[Arora 1992] A foundation of fault-tolerant computing, PhD thesis, University of Texas- Austin, 1992.3A Foundation of Fault-Tolerant Computing• Provide a uniform definition of fault-tolerance• Provide verification methods independent of technology, architecture, or application[Arora 1992] A foundation of fault-tolerant computing, PhD thesis, University of Texas- Austin, 1992.24Program and Fault• Program model: synchronization skeleton of finite-state programs– Finite number of variables with finite domains– Finite number of processes• State: a valuation of program variables• Finite state space Sp•Programp, Fault f ⊆ Sp× Sp• Use Dijkstra’s Guarded Commands (actions) as a shorthand to represent program and fault transitions– Guard  Statement;ProgramFaultSppfff5Examples of Intermittent Faults• Intermittent faults– Sudden acceleration in cruise control systems• E.g., Cruise control that only works in wet weather– Malfunction in a component of an electronic circuit when the voltage goes beyond a threshold• x and y are two points of contacts in a circuit that have independent voltages. However, when the voltage level of x goes beyond 3.5 v, y gets the same voltage as x. We model this class of faults by the following guarded commandx > 3.5 →→→→ y:=x;6Examples of Transient Faults• Transient faults– A hardware interrupt routine gets called without any interrupt being raised by hardware devices– Solar radiation corrupts the communication and the navigation systems– The variables of the controlling software of space shuttles may be corrupted by transient solar radiationstrue →→→→ x:=⊥⊥⊥⊥;The above guarded command means that at any state of the system, the variable x may be corrupted due to transient faults37Transient vs. Intermittent Faults• Transient faults are– difficult to detect– environmental (e.g., cosmic ray, pollution, vibration)• Intermittent faults are– caused by non-environmental (internal) conditions (e.g., loose connection, deteriorating or aging components )– more frequent than transient faults– sometimes reproducible• E.g., pressing the ‘Ctrl’ key causes the system to reset 8State Predicate• State predicate XX ⊆ Sp• Closure: X is closed in p• Projection p|X{(s0, s1) | (s0, s1) ∈ p ∧ s0 ∈ X ∧ s1 ∈ X }XSpp9Program Computations• Program computations– Infinite sequences of program transitionss0s1s2s3. . .s0s1s2s3. . .s4s5s6•Program computations in the presence of faults (denoted p[]f )–Infinite sequences of program and fault transitions• Computation prefix– Finite sequences of program transitions. . .s0s1s2s3sn410Specification, Invariant, and Fault-Span• Safety specification: something bad never happens– Formal representation⊆ Sp× Sp (set of bad transitions)• E.g., transitions that change the value of a counter from non-zero values to zero• Liveness specification: something good will eventually happen–In the absence of faults, fault-tolerant program p’ satisfies the liveness specification of the fault-intolerant program p• Invariant S, fault-span T ⊆ SpSSpfTp[]f p11Token Ring Example• Processes: P0, P1, P2, P3• Variables: x0 , x1 ,x2 ,x3 (domain: {0, 1, ⊥⊥⊥⊥})• Dijkstra’s Guarded Commands (actions)– Guard  Statement;• Fault-intolerant program– Process P0TR0:(x0=1) ∧∧∧∧ (x3=1)  x0:= 0;TR’0:(x0=0) ∧∧∧∧ (x3=0)  x0:= 1;P0P1P2P312Token Ring Example – Continued• Processes P1, P2, P3TRi:(xi=0)∧∧∧∧(x(i-1)=1) xi:= 1;TR’i:(xi=1)∧∧∧∧(x(i-1)=0) xi:= 0;• Fault transitions: process-restarttrue →→→→ xj:= ⊥⊥⊥⊥;513Token Ring Example – Continued• Invariant: (state is represented as a tuple: <x0, x1, x2, x3>)<0, 0, 0, 0>, <0, 1, 1, 1>,<1, 0, 0, 0>, <0, 0, 1, 1>,<1, 1, 0, 0>, <0, 0, 0, 1><1, 1, 1, 0>,<1, 1, 1, 1>,• Safety Specification– Corrupted value does not affect a non-corrupted process– There is only one token in the ring • Liveness of the fault-intolerant program– Token should be circulated infinitely often14Defining Fault-Tolerance: Closure•LetS be a state predicate of a program p,S is closed in p ifffor every action G -> st;executing st in a state of (S ∧ G) results in a state in SSSppNo such transitionsSuch transitions are allowed15Defining Fault-Tolerance: Convergence•Let S and T be state predicates of program pT converges to S in p iff1. S is closed in p2. T is closed in p3. Starting in T, each computation of p reaches a state in SSSpTp616Levels of Fault-Tolerance• Failsafe (program p’ is failsafe f-tolerant for spec from S)– Guarantee safety in the presence of faults • Nonmasking (program p’ is nonmasking f-tolerant for specfrom S)– Guarantee recovery in the presence of faults• Masking (program p’ is masking f-tolerant for specfrom S)– Guarantee safety and recovery in the presence of faultsSTp[]f pfSpSafety-violating transitionsp17Component-Based Design of Fault-ToleranceA fault-tolerant program =A fault-intolerant program +Fault-tolerance componentsTwo types of fault-tolerance components necessary and sufficient for the design of faults tolerance;detectors and correctors[Kulkarni 1999] Component-Based Design of Fault-tolerance, PhD thesis, The Ohio State University, 1999.18Synthesis of Fault-Tolerance• It is difficult to anticipate all classes of faults at the design time• New classes of faults requires the addition of corresponding level of fault-tolerance• Can we do it automatically?Synthesis AlgorithmFault-intolerant program pfFault-tolerant program p’[Ebnenasir 2004] Automatic Synthesis of Fault-tolerance, PhD thesis, Michigan State University, 2004.719Conclusion• Fault-tolerance is an important factor in the survivability of software systems• A well-defined need for – the design of correct fault-tolerant programs– the design of programs that tolerate multiple classes of faults


View Full Document

MSU CSE 870 - A Theory of Fault-Tolerance

Documents in this Course
HW2

HW2

3 pages

splc1

splc1

21 pages

Lessons

Lessons

3 pages

revision

revision

13 pages

ft1

ft1

12 pages

john.dsn

john.dsn

21 pages

Survey

Survey

2 pages

revision

revision

38 pages

Load more
Download A Theory of Fault-Tolerance
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A Theory of Fault-Tolerance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Theory of Fault-Tolerance 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?