DOC PREVIEW
CMU CS 15740 - Detailed Design and Evaluation of Redundant Multithreading Alternativ

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Detailed Design and Evaluation of Redundant Multithreading Alternatives* Shubhendu S. Mukherjee VSSAD Massachusetts Microprocessor Design Center Intel Corporation 334 South Street, SHR1-T25 Shrewsbury, MA 01545 [email protected] ABSTRACT Exponential growth in. the number of on-chip transistors, coupled with reductions in voltage levels, makes each generation of microprocessors increasingly vulnerable to transient faults. In a multithreaded environment, we can detect these faults by running two copies of the same program as separate threads, feeding them identical inputs, and comparing their outputs, a technique we call Redundant Muhithreading (RMT). This paper studies RMT techniques in the context of both single- and dual-processor simultaneous muhithreaded (SMT) single-chip devices. Using a detailed, commercial-grade, SMT processor design we uncover subtle RMT implementation complexities, and find that RMT can be a more significant burden for single- processor devices than prior studies indicate. However, a novel application of RMT techniques in a dual-processor device, which we term chip-level redundant threading (CRT), shows higher performance than locksteppbzg the two cores, especially on muhithreaded workloads. 1. INTRODUCTION Modern microprocessors are vulnerable to transient hardware faults caused by alpha particle and cosmic ray strikes. Strikes by cosmic ray particles, such as neutrons, are particularly critical because of the absence of any practical way to protect microproces- sor chips from such strikes. As transistors shrink in size with succeeding technology generations, they become individually less vulnerable to cosmic ray strikes. However, decreasing voltage levels and exponentially increasing transistor counts cause overall chip susceptibility to increase rapidly. To compound the problem, achieving a particular failure rate for a large multiprocessor server requires an even lower failure rate for the individual microproces- sors that comprise it. Due to these trends, we expect fault detection and recovery techniques, currently used only for mission-critical systems, to become common in all but the least expensive microprocessor devices. One fault-detection approach for microprocessor cores, which we term redundant muhithreading (RMT), runs two identical copies of the same program as independent threads and compares their outputs. On a mismatch, the checker flags an error and initiates a hardware or software recovery sequence. RMT has been proposed as a technique for implementing fault detection efficiently on top of a simultaneous multithreaded (SMT) processor (e.g., [18], [17], [15]). This paper makes contributions in two areas of RMT. First, we describe our application of RMT techniques to a processor that resembles a commercial-grade SMT processor design. The resulting design and its evaluation are significantly more detailed than previous RMT studies. Second, we examine the role of RMT * This work was performed at Compaq Computer Corporation, where Shubhendu S. Mukherjee was a full-time employee, Michael Kontz was an intern, and Steven K. Reinhardt was a contractor. Michael Kontz Colorado VLSI Lab Systems & VLSI Technology Operations Hewlett-Packard Company 3404 East Harmony Road, ms 55 Fort Collins, CO 80525 michael_kontz @ hp.com Steven K. Reinhard! EECS Department University of Michigan, Ann Arbor 1301 Beal Avenue Ann Arbor, MI 48109-2122 stever@ eecs.umich.edu techniques in forthcoming dual-processor single-chip devices. Our implementation of the single-processor RMT device is based on the previously published simultaneous and redundantly threaded (SRT) processor design [15] (Figure la). Howeveri unlike previous evaluations, we start with an extremely detailed performance model of an aggressive, commercial-grade SMT microprocessor resembling the Compaq Alpha Arafia (a.k.a. 21464 or EV8) [12]. We call this our base processor. We found several subtle issues involved in adding SRT features to such a base SMT design. For example, adapting the SRT branch outcome queue, which uses branch outcomes from one thread (the "leading" thread) to eliminate branch mispredictions for its redundant copy (the "trailing" thread), to our base processor's line-prediction-driven fetch architecture proved to be a particularly difficult task. We also describe and analyze a simple extension to the proposed SRT design, called preferential space redundancy, which significantly improves coverage of permanent faults. We then compare the performance of our SRT implementation with the baseline processor using the same detailed performance model. Our results indicate that the performance degradation of RMT (running redundant copies of a thread) compared to our baseline processor (running a single copy of the same thread) is 32% on average, greater, than the 21% indicated by our previous work [15]. We also find that store queue size has a major impact on SRT performance. Our SRT implementation lengthens the average lifetime of a leading-thread store by roughly 39 cycles, requiring a significantly greater number of store queue entries to avoid stalls. We propose the use of per-thread store queues to increase the number of store queue entries without severely impacting cycle time. This optimization reduces average perform- ance degradation from 32% to 30%, with significant benefits on several individual benchmarks. We also expand our performance study beyond that of previous work by examining the impact of SRT on multithreaded workloads. We run two logical application threads as two redundant thread pairs, consuming four hardware thread contexts on a single processor. We find our SRT processor's performance degradation for such a configuration is about 40%. However, the use of per- thread store queues can reduce the degradation to about 32%. Our second area of contribution involves the role of RMT techniques in dual-processor single-chip devices. Initial examples of these two-way chip multiprocessors (CMPs) are shipping (e.g., the IBM Power4 [7] and the HP Mako [8]). We expect this configuration to proliferate as transistor counts continue to


View Full Document

CMU CS 15740 - Detailed Design and Evaluation of Redundant Multithreading Alternativ

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Detailed Design and Evaluation of Redundant Multithreading Alternativ
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Detailed Design and Evaluation of Redundant Multithreading Alternativ and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Detailed Design and Evaluation of Redundant Multithreading Alternativ 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?