Detailed Design and Evaluation of Redundant Multithreading Alternatives Shubhendu S Mukherjee Michael Kontz VSSAD Massachusetts Microprocessor Design Center Intel Corporation 334 South Street SHR1 T25 Shrewsbury MA 01545 Shubu Mukherjee intel com Steven K Reinhard Colorado VLSI Lab EECS Department Systems VLSI Technology Operations University of Michigan Ann Arbor Hewlett Packard Company 1301 Beal Avenue Ann Arbor MI 48109 2122 3404 East Harmony Road ms 55 Fort Collins CO 80525 stever eecs umich edu michael kontz hp com techniques in forthcoming dual processor single chip devices ABSTRACT Our implementation of the single processor RMT device is based Exponential growth in the number of on chip transistors coupled on the previously published simultaneous and redundantly threaded with reductions in voltage levels makes each generation of SRT processor design 15 Figure la Howeveri unlike previous microprocessors increasingly vulnerable to transient faults In a evaluations we start with an extremely detailed performance model multithreaded environment we can detect these faults by running of an aggressive commercial grade SMT microprocessor two copies of the same program as separate threads feeding them resembling the Compaq Alpha Arafia a k a 21464 or EV8 12 identical inputs and comparing their outputs a technique we call We call this our base processor We found several subtle issues Redundant Muhithreading RMT involved in adding SRT features to such a base SMT design For This paper studies RMT techniques in the context of both singleexample adapting the SRT branch outcome queue which uses and dual processor simultaneous muhithreaded SMT single chip branch outcomes from one thread the leading thread to devices Using a detailed commercial grade SMT processor eliminate branch mispredictions for its redundant copy the design we uncover subtle RMT implementation complexities and trailing thread to our base processor s line prediction driven find that RMT can be a more significant burden for singlefetch architecture proved to be a particularly difficult task We also processor devices than prior studies indicate However a novel describe and analyze a simple extension to the proposed SRT application of RMT techniques in a dual processor device which design called preferential space redundancy which significantly we term chip level redundant threading CRT shows higher improves coverage of permanent faults performance than locksteppbzg the two cores especially on We then compare the performance of our SRT implementation muhithreaded workloads with the baseline processor using the same detailed performance 1 I N T R O D U C T I O N model Our results indicate that the performance degradation of Modern microprocessors are vulnerable to transient hardware RMT running redundant copies of a thread compared to our faults caused by alpha particle and cosmic ray strikes Strikes by baseline processor running a single copy of the same thread is cosmic ray particles such as neutrons are particularly critical 32 on average greater than the 21 indicated by our previous because of the absence of any practical way to protect microproceswork 15 We also find that store queue size has a major impact sor chips from such strikes As transistors shrink in size with on SRT performance Our SRT implementation lengthens the succeeding technology generations they become individually less average lifetime of a leading thread store by roughly 39 cycles vulnerable to cosmic ray strikes However decreasing voltage requiring a significantly greater number of store queue entries to levels and exponentially increasing transistor counts cause overall avoid stalls We propose the use of per thread store queues to chip susceptibility to increase rapidly To compound the problem increase the number of store queue entries without severely achieving a particular failure rate for a large multiprocessor server impacting cycle time This optimization reduces average performrequires an even lower failure rate for the individual microprocesance degradation from 32 to 30 with significant benefits on sors that comprise it Due to these trends we expect fault detection several individual benchmarks and recovery techniques currently used only for mission critical We also expand our performance study beyond that of previous systems to become common in all but the least expensive work by examining the impact of SRT on multithreaded workloads microprocessor devices We run two logical application threads as two redundant thread One fault detection approach for microprocessor cores which we pairs consuming four hardware thread contexts on a single term redundant muhithreading RMT runs two identical copies of processor We find our SRT processor s performance degradation the same program as independent threads and compares their for such a configuration is about 40 However the use of peroutputs On a mismatch the checker flags an error and initiates a thread store queues can reduce the degradation to about 32 hardware or software recovery sequence RMT has been proposed Our second area of contribution involves the role of RMT as a technique for implementing fault detection efficiently on top of techniques in dual processor single chip devices Initial examples a simultaneous multithreaded SMT processor e g 18 17 of these two way chip multiprocessors CMPs are shipping e g 15 This paper makes contributions in two areas of RMT First the IBM Power4 7 and the HP Mako 8 We expect this we describe our application of RMT techniques to a processor that configuration to proliferate as transistor counts continue to grow resembles a commercial grade SMT processor design The exponentially and wire delays rather than die area constrain the resulting design and its evaluation are significantly more detailed size of a single processor core than previous RMT studies Second we examine the role of RMT A two way CMP enables on chip fault detection using lockstepping where the same computation is performed on both processors This work was performed at Compaq Computer Corporation where on a cycle by cycle basis that is in lockstep Figure lb Shubhendu S Mukherjee was a full time employee Michael Kontz was Lockstepping has several advantages over SRT style redundancy an intern and Steven K Reinhardt was a contractor Lockstepping is a well understood technique as it has long been q 0 6 3 6 8 9 7 0 2 17 00 2002 I E E E 99 Sphere of Replication Sphere of Replication tle iading n
View Full Document
Unlocking...