Page: 1 © 2003 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 31 MAFT Background – SIFT » fault-tolerance is achieved under the control of executives » penalty: 70-80% overhead – FTMP » fault tolerant multiprocessor » provided hardware assistance for functions such as synchronization, voting and control functions » still 60% overhead of executive functions – Neither SIFT nor FTMP were designed to be true distributed systems => proof-of-concept systemsPage: 2 © 2003 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 31 MAFT “nostalgic” picture of SIFTPage: 3 © 2003 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 31 MAFT – FTP fault tolerant processor was targeted towards enhancing system efficiency FTP is a uniprocessor employing redundant processing channels throughput of redundant system is equal to uniprocessor throughput FTP was used in AIPS – AIPS Advanced Information Processing System AIPS concept has been demonstrated using a dynamic simulation of the Blackhawk helicopter The demonstration system consists of a quadruple redundant parallel processor, known as the AIPS/Army Fault Tolerant Architecture (AFTA), and a Silicon Graphics, Inc., workstation. The helicopter simulation executes on the workstation. AIPS was developed by Draper Labs under a NASA contract.Page: 4 © 2003 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 31 MAFT – FTPP » fault tolerant parallel processor emphasized performance issues » did not provide same level of Byz.-resilient operation as FTMPPage: 5 © 2003 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 31 MAFT MAFT (Kie88) – Multicomputer Architecture for Fault-Tolerance – Design objectives » reliability of 10-9 over 10 hours » minimum performance requirements 200 Hz. Max Task Iteration Rate 5.5 MIPS Max Computational Capacity 1.0 MBPS Max I/O Transfer Rate 5.0 ms Min Transport Lag (Input to Output) – Achieving reusability trough functional partitioning » Application Specific Functions » Standard Executive FunctionsPage: 6 © 2003 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 31 MAFT References [Dar88] Darwiche, A.A., and F.M. Doerenberg, Application of the Bendix/King Multicomputer Architecture for Fault-Tolerance in a Digital Fly-By-Wire Control System", Midcon, Aug 1988. [Glu86] Gluch, D.P., and M.J. Paul, Fault-Tolerance in Distributed Digital Fly-by-Wire Flight Control Systems", AIAA/IEEE Seventh Digital Avionics Systems Conference, Oct 1986. [Kie87] Kieckhafer, R.M., Task Reconfiguration in a Distributed Real-Time System," Eighth IEEE Real-Time Systems Symposium, Dec 1987. [Kie88] Kieckhafer, R.M., et al, The MAFT Architecture for Distributed Fault-Tolerance", IEEE Trans. Computers, V. C-37, No. 4, pp. 398-405, Apr 1988. [Kie89] Kieckhafer, R.M., Fault-Tolerant Real-Time Task Scheduling in the MAFT Distributed System," Proc, 22nd Hawaii International Conference on System Sciences, Jan 1989. [McE88] McElvany, M.C., Guaranteeing Deadlines in MAFT," Proc. IEEE Real-Time Systems Symp., pp. 130-139, Dec 1988. [Tha88] Thambidurai, P.M., and Y.K. Park, Interactive Consistency with Multiple Failure Modes", Proc. Seventh Reliable Dist Systems Symp., Oct 1988. [Tha88a] Thambidurai, P.M. Critical Issues in the Design of Distributed, Fault-Tolerant, Hard Real-Time Systems, Ph.D. Dissertation, Dept. of Electrical Engineering, Duke University, 1988. [Tha89] Thambidurai, P.M., et al., Clock Synchronization in MAFT", Nineteenth Fault-Tolerant Computing Symposium, pp. 142-151, Jun 1989. [Wal88] C.J. Walter, MAFT: An Architecture for Reliable Fly-by-Wire Flight Control," Proc. AIAA/IEEE Eighth Digital Avionics Systems Conference, pp. 415-421, Oct 1988.Page: 7 © 2003 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 31 MAFT – System ArchitecturePage: 8 © 2003 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 31 MAFT – Application Processor (AP) » free to executes application program e.g. reading sensors, performing control law computations, sending commands to actuators » flexibility to select processor suitable for the application – Operations Controller (OC) » special purpose device common to all MAFT systems » performs overhead functions including internode communication synchronization data voting error detection task scheduling system reconfiguration » as a result the OS on application processor is extremely simplePage: 9 © 2003 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 31 MAFT – Operation Controller block diagram » communication with other OCs with local APPage: 10 © 2003 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 31 MAFT – Communication » transmitter formats message, message type, framing, ECC broadcasts message » receiver one per incoming link accepts properly framed bytes buffer byte for message checker » message checker poll receiver at a cycle rate of 6.4 µs physical and logical checks forward good messages to other subsystems dump bad messagesPage: 11 © 2003 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 31 MAFT – synchronization » two major synchronization functions steady-state operation – maintain synchronization in the operating set – similar to SIFT – loose frame based synchronization achieved using system state (SS) messages – accuracy depends on length of sync. interval, clock drift and message delivery delay. Given aircraft geometry, max skew is 18 µs. startup – cold start mode for initialization of system or midmission event – warm start mode for synchronizing a node to an existing operating setPage: 12 © 2003 A.W. Krings CS449/549 Fault-Tolerant Systems Sequence 31 MAFT – Steady-State Synchronization » each iteration has 2 phases: phase 1 (fixed length) – count a fixed number of local clock ticks – broadcast “presync” SS (system state) message phase 2 (variable length) – receive message during “presync
View Full Document