Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10Markov Models of Typical SystemsAbsorbing State–Simple Triplex System with fail-stop behavior–Assumptions:»no repair»perfect fault coveragerecall: fault coverage is a measure of the systems ability to detect faults and recover»homogeneous components»independent failures1Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10Markov Models of Typical SystemsSystem with Repair–Simple Triplex System with fail-stop behavior and repair–Same assumptions as previous example2Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10Markov Models of Typical SystemsFault Coverage–Simple Triplex System–Fault Coverage »C = probability that fault is properly handled»Ci has major impact on reliability3Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10Markov Models of Typical SystemsMultiple Faults–Simple Triplex System with double fault4Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10Markov Models of Typical Systems–Simple Triplex System with double fault (cont.)5Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10Markov Models of Typical SystemsDifferent λ’s–e.g. hot + cold spares»typical MTTF(cold) = 10 MTTF(hot)–notation: h.c»h = number of hots»c = number of colds–assume perfect coverage–assume switching mechanism6Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10Markov Models of Typical SystemsPassive TMR with 2 Failure Modes–fail passive => processor just disconnects–notation: n.f »n = number of non-faulty processors running»f = number of faulty processors running–assume different fail rates»failure mode 1: benign fail rate λstop »failure mode 2: single non-benign fail rate λerr 7Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10Markov Models of Typical Systems–Passive TMR with 2 Failure Modes (cont.)8Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10Sharpe & Markov Chains9Here we used SHARPE to determine the unreliabilities.The main slide of interest is the last one that contains the probabilities of being in the specific states.Why is this interesting? Well...Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10Sharpe & Markov ChainsSHARPE extracts model and analysis type:–Cyclic vs. Acyclic Model–Steady-State vs. Transient Analysismarkov model_name {param_list} from to transition_rate<name name expression>end initial state probabilities<name expression>end10Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10Markov Model–P(0) = 0 is assumed for any state not listed in initialization–The sum of all P(0) must be 1 »Beware of round-off errors–Initial state probabilities section may be left empty if:»Acyclic model with only 1 source stateAssumes P(0) = 1 for that state»Irreducible model Steady State analysisin this case initial conditions are irrelevant–Advice: Always specify initial probabilities11Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10Useful functionstvalue ( t; model_name, state; arg_list )–Gives Transient Probabilities at time t.–If no state is given: »computes transient prob. of being in an absorbing state at time t»there can be more than one absorbing state => prob. of being in any absorbing state.–If state is given: »computes transient prob. of being in that state at time t12Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10Useful functionsprob ( model_name, state {;arg_list} )–Gives Steady State Probabilities (no time param)–Note: state parameter is not optional–With absorbing states this computes the steady state probability of ever visiting a specific state–If no absorbing states exist (irreducible chain), the steady state probability of being in a specified state is computed13Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10Interpretation of F(t)If a non-absorbing state is specified:–F(t) is the transient or steady state CDF for that stateIf an absorbing state is specified:–F(t) is the CDF to absorbing by that state–absorbing state normally indicates a specific failure modeIf no state is specified:–F(t) is the CDF to include all absorbing states–i.e. it is the sum of all CDFs of individual absorbing states–e.g. indicating system failure14Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10TMR example* SYSTEM: TMR_2MODE -- PASSIVE TMR WITH: FAIL-STOP AND FAIL-ACTIVE MODES* MODELS: MARKOV (ACYCLIC)* STATE NOTATION: "N.F" WHERE:* N == NUMBER OF NON-FAULTY PROCESSORS RUNNING.* F == NUMBER OF FAULTY PROCESSORS STILL RUNNING.**---------------------------- MODEL DEFINITIONSMARKOV tmr_2mode * 3.0 2.0 3*LAMstop 3.0 2.1 3*LAMerr * 2.0 1.0 2*LAMstop 2.0 1.1 2*LAMerr * 2.1 2.0 1*LAMstop 2.1 1.1 2*LAMstop 2.1 1.2 2*LAMerr * 1.0 0.0 1*LAMstop 1.0 0.1 1*LAMerrEND*---------------------------- INITIAL CONDITIONS (START IN 3.0) 3.0 1.00END15Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10TMR example*---------------------------- PARAMETER BINDINGBIND LAMBDA 1*10^-4 LAMstop 0.9*LAMBDA LAMerr 0.1*LAMBDAEND*---------------------------- ANALYSES AND EVALUATIONScdf (tmr_2mode)cdf (tmr_2mode,0.0)var fail01 value(100.0;tmr_2mode,0.1)var fail11 value(100.0;tmr_2mode,1.1)var fail12 value(100.0;tmr_2mode,1.2)var failrun fail01 + fail11 + fail12var failstop value(100.0;tmr_2mode,0.0)var failall failrun + failstopexpr fail01expr fail11expr fail12expr failrunexpr failstopexpr failallEND16Page: © 2011 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 10TMR exampleCDF for system tmr_2mode: 1.0000e+00 t( 0) exp( 0.0000e+00 t) + -2.5579e+00 t( 0) exp(-1.0000e-04 t) + 2.4000e+00 t( 0) exp(-2.0000e-04 t) + -2.8421e+00 t( 0) exp(-2.9000e-04 t) + 2.0000e+00 t( 0) exp(-3.0000e-04 t)mean: 1.6713e+04variance: 1.3541e+08-------------------------------------------information about system tmr_2mode node 0.0probability of entering node: 7.5414e-01conditional CDF for time of reaching this absorbing state 1.0000e+00 t( 0) exp( 0.0000e+00 t) + -3.0526e+00 t( 0) exp(-1.0000e-04 t) + 3.2222e+00 t( 0)
View Full Document