UI CS 449 - System Diagnosis
Course Cs 449-
Pages 15

Unformatted text preview:

Page: 1 © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 24 System Diagnosis  Objective – Designing systems that are capable of self-diagnoses of multiple faults  Motivation – Multiprocessor systems employ increasing numbers of processors. Some of these processors will fail. – Applications include safety critical systems. – Inaccessible systems, e.g. remote, under water or ground, space. – “It is always good to know who your enemies are”.Page: 2 © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 24 System Diagnosis  Assumptions – System is partitioned into units » units need not be identical » units are powerful enough to test and judge other units pass/fail. – Tests are adequate to detect all faults » perfect coverage. (This is very restrictive since it also implies faults to be permanent). – There exists a reliable method for collecting and evaluating all test results » e.g. reliable broadcast – These assumptions are often termed PMC Model, after early work by Preparata, Metze and Chien (1967) “On the Connection Assignment Problem of Diagnosable Systems”, IEEE Trans. on Electronic Computers, Vol. EC16, No. 6, Dec. 1967.Page: 3 © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 24 System Diagnosis  System Graph  Definitions – Test graph G = (U, E) – U : the set of units – E : the set of testing links (edges) – aij: the outcome of test (Ui, Uj) if Ui is non-faulty then if Uj is non-faulty => aij = 0 if Uj is faulty => aij = 1 else aij is unreliable 1 2 3 4 5 test links unitsPage: 4 © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 24 System Diagnosis  Example: single fault, U1 faulty – Then a51 = 1 and a12 = X (0 or 1) – Syndrome S = set of all outcomes » order all aij -> a12 -> a23 -> a34 -> a45 -> a51 ->^ -> X -> 0 -> 0 -> 0 -> 1 ->^ – 2 cases » Single 1 in a51 => U1 is faulty  note if U5 was faulty, then a45 = 1 » Pair of adjacent 1’s  the “upstream” 1 is correct  a45 = 1 1 2 3 4 5Page: 5 © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 24 System Diagnosis  Definition: t-fault-diagnosable – Every set of up-to t faulty units can be correctly diagnosed (eventually). » Previous example is 1-fault-diagnosable » not 2-fault diagnosable  e.g. assume U1 and U2 faulty and a12 = 0  same syndrome as 1-fault-diagnosable example  Definition: one-step t-fault-diagnosable – For every set of up-to t faulty units there exists a unique syndrome which correctly identifies all faulty units.Page: 6 © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 24 System Diagnosis  Definition: Sequential t-fault-diagnosable – For every set of up-to t faulty units there exists a unique syndrome which correctly identifies at least one faulty unit. – (Can be applied recursively)Page: 7 © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 24 System Diagnosis  Example: dual fault – Assume U1 and U2 are both faulty » (a12 , a23 , a34 , a45 , a51 ) » ( X , X , 0 , 0 , 1 ) – Could mimic single fault at U1 » i.e. ( 0 , 0 , 0 , 0 , 1 ) – But, pattern 001 always points to a faulty unit » thus remove U1 and reconfigure » (a23 , a34 , a45 , a52 ) » ( X , 0 , 0 , 1 ) » still have 001 pattern => U2 diagnosed to be faulty 1 2 3 4 5 1 2 3 4 5Page: 8 © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 24 System Diagnosis  Example: dual fault – Assume U1 and U3 are both faulty » (a12 , a23 , a34 , a45 , a51 ) » ( X , 1 , X , 0 , 1 ) – Now possible pattern » ( 1 , 1 , 1 , 0 , 1 ) » still points to one faulty unit => U1 » after reconfiguration  (a23 , a34 , a45 , a52 )  ( 1 , X , 0 , 0 ) » points to one faulty unit => U3 1 2 3 4 5 1 2 3 4 5Page: 9 © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 24 System Diagnosis  Necessary and Sufficient Conditions » t = # of faults » n = # of units » N = # of testing links – One-step t-fault-diagnosable system » each unit is tested by more than t-1 other units (or: at least t) » this implies » optimal: replace with = – Sequentially t-fault-diagnosable system » necessary » and sufficient n t≥ +2 1N n t≥ ⋅≥n t≥ +2 1N n t≥ + −2 2Page: 10 © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 24 System Diagnosis  Single Loop System (Ring) – Let – Loop is sequentially t-fault-diagnosable if » proof given in paper Pre67 – e.g. » t = 1 => m = 0, λ = 1, n = 1 + 1 + 1 = 3 » t = 2 => m = 1, λ = 0, n = 1 + 22 + 0 = 5 » t = 3 => m = 1, λ = 1, n = 1 + 22 + 2 = 7 t m= +2λinteger 0 or 1 (even or odd) n m m≥ + + + +1 1 12( ) ( )λPage: 11 © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 24 System Diagnosis  Inefficiency of PMC – PMC requires for t-diagnosability, that each node must be tested by at least t other nodes. – Problem: many diagnosis! – Alternative: adaptive models » the term adaptive stems from allowing the choice of which test(s) to perform depend on the results of previous tests.Page: 12 © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 24 System Diagnosis  Adaptive Distributed System Diagnosis » “Implementation of On-Line Distributed System-Level Diagnosis Theory” by Bianchini and Buskens, Trans. on Computers, May 1992 – Uses array TESTED_UPx at each node nx – Meaning of TESTED_UPx[i] » TESTED_UPx[i] = j implies that node nx has received information from fault-free node ni, that ni found nj to be fault-free. – Idea: » each node finds first node that is fault-free  ni checks nj j > i mod N, where N is the number of nodes. » get other TESTED_UPi values from TESTED_UPj » implies that node nx has received information from fault-free node nj, that ni found nj to be fault-free.Page: 13 © 2007 A.W. Krings


View Full Document

UI CS 449 - System Diagnosis

Course: Cs 449-
Pages: 15
Download System Diagnosis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view System Diagnosis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view System Diagnosis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?