DOC PREVIEW
CMU CS 15740 - Exceeding the Dataflow Limit via Value Prediction

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Meanwhile, numerous mechanisms have been proposed andimplemented to eliminate false data dependences and toler-ate the latencies induced by true data dependences by allow-ing instructions to execute out of program order (see [8] foran overview).Surprisingly, in light of the extensive energies focused oneliminating control-flow restrictions on parallel instructionissue, less attention has been paid to eliminating data-flowrestrictions on parallel issue. Recent work has focused pri-marily on reducing the latency of specific types of instruc-tions (usually loads from memory) by rearranging pipelinestages [9, 10], initiating memory accesses earlier [11], orspeculating that dependences to earlier stores do not exist[12, 13, 14, 15].The most relevant prior work in the area of eliminatingdata-flow dependences consists of the Tree Machine[16,17], which uses a value cache to store and look up theresults of recurring arithmetic expressions to eliminateredundant computation (the value cache, in effect, performscommon subexpression elimination [1] in hardware). Rich-ardson follows up on this concept in [18] by introducing theconcepts of trivial computation, which is defined as the triv-ialization of potentially complex operations by the occur-rence of simple operands; and redundant computation,where an operation repeatedly performs the same computa-tion because it sees the same operands. He proposes a hard-ware mechanism (the result cache) which reduces thelatency of such trivial or redundant complex arithmeticoperations by storing and looking up their results in theresult cache. In [19], we introduced value locality, a conceptrelated to redundant computation, and demonstrated a tech-nique--Load Value Prediction, or LVP--for predicting theresults of load instructions at dispatch by exploiting theaffinity between load instruction addresses and the valuesthe loads produce. LVP differs from Harbison’s value cacheand Richardson’s result cache in two important ways: first,the LVP table is indexed by instruction address, and hencevalue lookups can occur very early in the pipeline; second,it is speculative in nature, and relies on a verification mech-anism to guarantee correctness. In contrast, both Harbisonand Richardson use table indices that are only available laterin the pipeline (Harbison uses data addresses, while Rich-ardson uses actual operand values); and require their predic-tions to be correct, hence requiring mechanisms for keepingExceeding the Dataflow Limit via Value PredictionMikko H. Lipasti and John Paul ShenDepartment of Electrical and Computer EngineeringCarnegie Mellon UniversityPittsburgh PA, 15213{mhl,shen}@ece.cmu.eduAbstractFor decades, the serialization constraints imposed bytrue data dependences have been regarded as an absolutelimit--the dataflow limit--on the parallel execution of serialprograms. This paper proposes a new technique--value pre-diction--for exceeding that limit that allows data dependentinstructions to issue and execute in parallel without violat-ing program semantics. This technique is built on the con-cept of value locality, which describes the likelihood of therecurrence of a previously-seen value within a storage loca-tion inside a computer system. Value prediction consists ofpredicting entire 32- and 64-bit register values based onpreviously-seen values. We find that such register valuesbeing written by machine instructions are frequently pre-dictable. Furthermore, we show that simple microarchitec-tural enhancements to a modern microprocessorimplementation based on the PowerPC 620 that enablevalue prediction can effectively exploit value locality to col-lapse true dependences, reduce average result latency, andprovide performance gains of 4.5%-23% (depending onmachine model) by exceeding the dataflow limit.1. Motivation and Related WorkThere are two fundamental restrictions that limit theamount of instruction level parallelism (ILP) that can beextracted from sequential programs: control flow and dataflow. Control flow limits ILP by imposing serialization con-straints at forks and joins in a program’s control flow graph[1]. Data flow limits ILP by imposing serialization con-straints on pairs of instructions that are data dependent (i.e.one needs the result of another to compute its own result, andhence must wait for the other to complete before beginningto execute). Examining the extent and effect of these limitshas been a popular and important area of research, particu-larly in the case of control flow [2,3,4,5]. Continuingadvances in the development of accurate branch predictors(e.g. [6]) have led to increasingly-aggressive control-spec-ulative microarchitectures (e.g. the Intel Pentium Pro [7]),which undertake aggressive measures to overcome control-flow restrictions by using branch prediction and speculativeexecution to bypass control dependences and expose addi-tional instruction-level parallelism to the microarchitecture.Copyright 1996 IEEE. Published in the Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture,Dec. 2-4, 1996, Paris, France. Personal use of this material is permitted. However, permissions to reprint/republish this material foradvertising or promotional purposes or for creating new collective works for resale or distribution to servers or lists, or to reuse anycopyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyright and Permissions/IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: +Intl. 908-562-3966.their tables coherent with all other computation.In this paper, we extend the LVP approach for predictingthe results of load instructions to apply to all instructions thatwrite an integer or floating point register; show that a sig-nificant proportion of such writes are trivially predictable;describe a value prediction hardware mechanism that allowsdependent instructions to execute in parallel; and presentresults that demonstrate significant performance increasesover our baseline machine models.2. Taxonomy of Speculative ExecutionIn order to place our work on value prediction into ameaningful historical context, we introduce a taxonomy ofspeculative execution. This taxonomy, summarized inFigure 1, categorizes ours as well as previously-introducedtechniques based on which types of dependences are beingbypassed (control vs. data), whether the speculation relatesto storage location or value, and


View Full Document

CMU CS 15740 - Exceeding the Dataflow Limit via Value Prediction

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Exceeding the Dataflow Limit via Value Prediction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Exceeding the Dataflow Limit via Value Prediction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Exceeding the Dataflow Limit via Value Prediction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?