DOC PREVIEW
UNCC ECGR 6185 - Effective Optimistic-Checker Tandem Core Design Through Architectural Pruning

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Effective Optimistic-Checker Tandem Core Design Through Architectural PruningFrancisco J. Mesa-Mart´ınez Jose RenauDept. of Computer Engineering, University of California Santa Cruzhttp://masc.cse.ucsc.eduAbstractDesign complexity is rapidly becoming a limiting fac-tor in the design of modern, high-performance micro-processors. This paper introduces an optimization tech-nique to improve the efficiency of complex processors. Us-ing a new metric (µUtilization), the designer can identifyinfrequently-used functionality which contributes little toperformance and then systematically “prune” it from thedesign. For cases in which architectural pruning may affectdesign correctness, previously proposed techniques can beapplied to guarantee forward progress.To explore the benefits of architectural pruning, westudy a candidate Optimistic-Checker Tandem architecture,which combines a complex Alpha EV6-like out-of-order Op-timistic core, with some of the underutilized functionalitypruned from its design, with a non-pruned EV5-like in-orderChecker core. Our results show that by removing 3% ofinfrequently used functionality from the optimistic core anincrease in frequency of 25% can be realized. Taking intoaccount the replay overhead triggered by the removed func-tionality, the Tandem system is still able to achieve a 12%overall speedup.1 IntroductionDesign cost is a limiting factor in the design of mod-ern high performance architectures. The inherent complex-ity found in modern processors makes the optimization forarea, power, and frequency a challenging task. It may beargued that excessive design complexity has terrible con-sequences as innovation may be hampered. It is thereforecritical that designers are given new methods to meet ag-gressive design targets in the face of growing complexity.This work proposes the systematic use of Archi-tectural Pruning –the selective removal of infrequentlyused functionality– as a design optimization methodology.This new methodology and its associated metric, namedµUtilization, allows designers to rank the Hardware De-scription Language (HDL) statements in a processor de-sign based on their activity. Efficiency metrics that buildon µUtilization correlate activity with contribution to per-formance or some other criteria. A set of heuristics deter-mine removal since not every segregated element is equallyvaluable. Infrequently-used statements that contribute littleto performance are then removed, and the rest of the designis re-optimized around them. Removing functionality maylead to the introduction of faults into a design. To handlepossible faults and guarantee correct execution and forwardprogress in the event of the processor transitioning to a state,where removed functionality would otherwise be executed,any one of several previously proposed techniques can beused [2, 11, 19, 20, 21, 29]. In this work we use a simplein-order checker core.Architectural pruning is motivated by the observationthat significant hardware functionality is often required tohandle extremely rare events. However, if forward progressand correctness can be ensured despite missing functional-ity, then the main design can be optimized. 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100µUtilizationNormalized Ranking of HDL Statements20% of lines virtually never usedNon-CumulativeCumulativeFigure 1. RiSC-16 [13] µUtilization.To have some insights on the optimization opportunities,Figure 1 shows the µUtilization for a simple RiSC-16 [13]processor after executing a benchmark to solve Laplaceequations. RiSC-16 is a simple, single-issue out-of-orderprocessor. The non-cumulative graph in Figure 1 shows thenormalized rankings of µUtilization for the HDL statementsin the codebase. This is a listing of µUtilization data valuesin ascending order, normalizing to 100 statements. Usingthis plot, we can easily see that fully 80% of HDL state-ments (x-axis) are used less than 20% of the time (y-axis).The cumulative graph in Figure 1 represents the integra-tion of the µUtilization up to a particular statement. We40th IEEE/ACM International Symposium on Microarchitecture1072-4451/07 $25.00 © 2007 IEEEDOI 10.1109/MICRO.2007.2323640th IEEE/ACM International Symposium on Microarchitecture1072-4451/07 $25.00 © 2007 IEEEDOI 10.1109/MICRO.2007.2323640th IEEE/ACM International Symposium on Microarchitecture1072-4451/07 $25.00 © 2007 IEEEDOI 10.1109/MICRO.2007.23236Authorized licensed use limited to: University of North Carolina at Charlotte. Downloaded on January 29, 2010 at 12:31 from IEEE Xplore. Restrictions apply.can easily see that 20% of the statements are virtually neverused. Therefore, roughly 20% of the codebase can be re-moved without sacrificing any significant performance dueto the replays needed to guarantee execution correctness andforward progress. This shows that for even simple hardwaredesigns, significant portions of the codebase are dedicatedto extremely rare events. By not having to explicitly han-dle these rare events, designers are afforded dramatic newopportunities for optimization. Once removed, previouslycomplex structures are pruned-down, freeing valuable real-estate and reducing pressure on critical paths.To demonstrate the effectiveness of architectural pruningas an optimization technique, we evaluate a Tandem pro-cessor organization combining a pruned out-of-order Opti-mistic core, to explore data and control behavior, with an in-order Checker core that guarantees forward progress. Un-der this Optimistic Execution approach, the Checker corecombines possible future memory prefetching and predic-tion updates with verified past branch behavior to hide someof its associated latencies. The results obtained show that itis indeed possible to “prune” under performing structuresfrom a complex candidate design. The pruned out-of-ordercore cycles 1.25 times faster. Despite the increased rate ofreplays needed to guarantee forward progress, the resultingsystem still exhibits a 12% performance increase with re-spect to the original complex processor [28] that serves asthe basis for the pruned core in the Tandem configuration.This paper makes several contributions; It proposes forthe first time an architectural pruning methodology as apossible processor optimization technique. It quantitativelyevaluates the effect of pruning on an HDL design for anout-of-order core –specifically, the Illinois Verilog Model(IVM) [28]. It also explores an architectural organization inwhich a


View Full Document

UNCC ECGR 6185 - Effective Optimistic-Checker Tandem Core Design Through Architectural Pruning

Documents in this Course
Zigbee

Zigbee

33 pages

Load more
Download Effective Optimistic-Checker Tandem Core Design Through Architectural Pruning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Effective Optimistic-Checker Tandem Core Design Through Architectural Pruning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Effective Optimistic-Checker Tandem Core Design Through Architectural Pruning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?