Unformatted text preview:

Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Redundancy!Hardware redundancy–add extra hardware for detection or tolerating faults!Software redundancy–add extra software for detection and possibly tolerating faults!Information redundancy–extra information, i.e. codes!Time redundancy–extra time for performing tasks for fault tolerance1Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Fault Tolerance!Error Detection!Damage Confinement!Error Recovery!Fault Treatment2Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Error Detection!ideal check–determined solely from specification–complete, correct–check should be independent from system»check fails if system crashes!acceptable check–cost–reasonable check, e.g. monitor rate of change!diagnostics–performed “by system on system components”–e.g. power-up diagnostics3Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Damage Confinement!error might propagate and spread!identify boundaries to state beyond which no information exchange has occurred!dynamically => hard!statically => e.g. fire wall4Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Error Recovery!backward recovery–state is restored to an earlier state»requires checkpoints–most frequently used–recovery overhead!forward recovery–try to make state error-free–need accurate assessment of damage–highly application-dependent5Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Fault Treatment!if transient fault: restart system, go to error-free state!system repair–on-line, no manual intervention, (automatic)–dynamic system reconfiguration–spare (hot or cold)6Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Fault Coverage!measure of system’s ability to perform:–fault detection–fault location–fault containment–(and/or fault recovery)!C = P(fault recovery | fault existence), !Note: –recovery implies that the system as a whole is operational–this does not imply that a “repair” occurred–e.g. duplex system with benign fault can recover to continue operation on one non-faulty processor7Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Hardware Redundancy!Passive (static)–uses fault masking to hide occurrence of fault–no action from the system is required–e.g. voting!Active (dynamic)–uses comparison for detection and/or diagnoses–remove faulty hardware from system => reconfiguration!Hybrid–combine both approaches–masking until diagnostic complete–expensive, but better to achieve higher reliability8Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Passive Hardware Redundancy!N-Modular Redundancy (NMR)–N independent modules replicate the same function»parallelism–results are voted on–requirements: N >= 3!TMR (Triple Modular Redundancy)VVoter:• is single point of failure.• could be very simple, • but who guards the guard?9Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Who guards the guards?!Replicate votersVVVRestoring Organ:since it produces 3 correct outputs even if one input is faulty.eliminate single point of failure10Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Who guards the guards?!Multistage TMR with replicate votersVVVVVV11Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Voting!if inputs are independent, the NMR can mask up to !e.g. 1 bit majority voter (3 AND gates ORed)Faults&&&+I1I3I2ZZ=1 if 2 of 3 inputs are 1Z=0 if 2 of 3 inputs are 012Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Flux Summing!Inherent property of closed loop control system!If one module becomes faulty, remaining modules compensate automatically.13Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3!Duplicate and Compare –can only detect, but NOT diagnose»i.e. fault detection, no fault-tolerance–may order shutdown–comparator is single point of failure»simple implementation: 2 input XOR for single bit compareActive Hardware RedundancyCIn OutAgreeM1M214Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Active Hardware RedundancyJohnson 1989 15Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Active Hardware Redundancy!Stand-by-sparing–only one module is driving outputs–other modules are»idle => hot spares»shut down => cold spares–error detection => switch to a new module–hot spares»no power-up delays»power consumption–cold spares»opposite of hot spares16Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Johnson 1989 17Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Active Hardware Redundancy!Pair and Spare–duplication combined with compare & spare–2 modules are always on-line –2-of-N switch–pairs are often combined18Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Johnson 1989 19Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Hybrid Hardware Redundancy!NMR with spares –N active + S spare modules (off-line)–voting and comparison–replace erroneous module from spare pool–maintains N constant–uses N-of-(N+S) switch!example: 2 faults at 2 different times–hybrid solution => N = 4–passive solution => N = 520Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Johnson 1989 21Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Hybrid Hardware Redundancy!Self-purging NMR (Joh89 Fig 3.17)–all modules are active–exclude modules on error detection»vote & compare–N will decrease with faults22Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Johnson 1989 23Page: © 2007 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 3Hybrid Hardware Redundancy!Triple-Duplex (Johnson 1989 Fig. 3.26, page 80)–redundant self checking–each node is really 2 modules + comparator–self-disable in event of error–“simulate” benign behavior–triple-triplex used in Boeing 777 primary flight computer»each triplex node employs 3 dissimilar processors24Page: © 2007


View Full Document

UI CS 449 - Redundancy

Course: Cs 449-
Pages: 13
Download Redundancy
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Redundancy and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Redundancy 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?