DOC PREVIEW
CMU CS 15745 - A Comparkon of Full and Partial Predicated Execution Support for ILP Processors

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

A Comparkon of Full and Partial Predicated Execution Supportfor ILP ProcessorsScott A. Mahlke* Richard E. HankJames E. McCormickDavid I. AugustWen-mei W. HwuCenter for Reliable and High-Performance ComputingUniversity of IllinoisUrbana-Champaign, IL 61801AbstractOne can effectively utilize predicated execution to improvebranch handling in instruction-level parallel processors. Al-though the potential benefits of predicated execution arehigh, the tradeoffs involved in the design of an instruction setto support predicated execution can be difficult. On one endof the design spectrum, architectural support for full pred-icated execution requires increasing the number of sourceoperands for all instructions. Full predicate support pro-vides for the most flexibility and the largest potential perfor-mance improvements. On the other end, partial predicatedexecution support, such as conditional moves, requires verylittle change to existing architectures. This paper presentsa preliminary study to qualitatively and quantitatively ad-dress the benefit of full and partial predicated execution sup-port. With our current compiler technology, we show thatthe compiler can use both partial and full predication toachieve speedup in large control-intensive programs. Somedetails of the code generation techniques are shown to pro-vide insight into the benefit of going from partial to fullpredication. Preliminary experimental results are very en-couraging: partial predication provides an average of 33’%performance improvement for an 8-issue processor with nopredicate support while full predication provides an addi-tional 30~o improvement.1 IntroductionBranch instructions are recognized as a major impediment toexploiting instruction-level parallelism (ILP). ILP is limitedby branches in two principle ways. First, branches imposecontrol dependence which restrict the number of indepen-dent instructions available each cycle. Branch prediction* Scott Mahlke is now with Hewlett Packard Laboratories,Palo Alto, CA.Permission to copy without fee all or part of this material isgranted provided that the copies are not made or distributed fordirect commercial advantage, the ACM copyright notice and thetitle of the publication and its date appear, and notice is giventhat copyin is by permission of the Association of Computing?Machinery. o copy otherwise, or to republish, requiresa fee and/or specific permission.ISCA ’95, Santa Margherita Ligure Italy@ 1995 ACM 0-89791 -698-0/95/0006... $3.50in conjunction with speculative execution is typically uti-lized by the compiler andlor hardware to remove controldependence and expose ILP in superscalar and VLIW pro-cessors [1] [2] [3]. However, misprediction of these branchescan result in severe performance penalties. Recent studieshave reported a performance reduction of two to more thanten when realistic instead of perfect branch prediction is uti-lized [4] [5] [6]. The second limitation is that processor re-sources to handle branches are often restricted. As a result,for control intensive applications, an artificial upper boundon performance will be imposed by the branch resource con-straints. For example, in an instruction stream consisting of40% branches, a four issue processor capable of processingonly one branch per cycle is bounded to a maximum of 2.5sustained instructions per cycle.Predicated execution support provides an effective meansto eliminate branches from an instruction stream. Pred-icated or guarded execution refers to the conditional exe-cution of an instruction based on the value of a booleansource operand, referred to as the predicate [7] [8]. Thisarchitectural support allows the compiler to employ an z\-corwersion algorithm to convert conditional branches intopredicate defining instructions, and instructions along al-ternative paths of each branch into predicated instruc-tions [9] [10] [11]. Predicated instructions are fetched regard-less of their predicate value. Instructions whose predicate istrue are executed normally. Conversely, instructions whosepredicate is false are nullified, and thus are prevented frommodifying the processor state.Predicated execution provides the opportunity to signifi-cantly improve branch handling in ILP processors. The mostobvious benefit is that decreasing the number of branches re-duces the need to sustain multiple branches per cycle. There-fore, the artificial performance bounds imposed by limitedbranch resources can be alleviated. Eliminating frequentlymispredicted branches also leads to a substantial reductionin branch prediction misses [12]. As a result, the perfor-mance penalties associated with mispredictions of the elim-inated branches are removed. Finally, predicated executionprovides an efficient interface for the compiler to expose mul-tiple execution paths to the hardware. Without compilersupport, the cost of maintaining multiple execution paths inhardware grows exponentially.Predicated execution may be supported by a range of ar-chitectural extensions. The most complete approach is full138predicate support. With this technique, all instructions areprovided with an additional source operand to hold a pred-icate specifier. In this manner, every instruction may beapredicated. Additionally, a set of predicate defining opcodesare added to efficiently manipulate predicate values. Thisapproach wasmost notably utilized in the Cydra 5 min-isupercomputer [8] [13]. Full predicate execution supportprovides themost flexibility and the largest potential per-formance improvements. The other approachis to providepartial predicate support. With partial predicate support,asmall number of instructions are provided which condition-allyexecute, such as a conditional move. As a result, partialpredicate support minimizes the required changesto existinginstructionset architectures (ISA’s) and data paths. Thisapproachis most attractive for designers extending currentISA’s inan upward compatible manner.In this paper, the tradeoffs involved in supporting fulland partial predicated executionare investigated. Usingthe compilation techniques proposed in this paper, partialpredicate support enables the compilerto perform full if-conversion to eliminate branches and expose ILP. Therefore,the compiler may remove as many branches with partialpredicate support as with full predicate support. By remov-ing a large portion of the branches, branch handlingis sig-nificantly improved for ILP processors with partial predicatesupport. The


View Full Document

CMU CS 15745 - A Comparkon of Full and Partial Predicated Execution Support for ILP Processors

Documents in this Course
Lecture

Lecture

14 pages

Lecture

Lecture

19 pages

Lecture

Lecture

8 pages

Lecture

Lecture

5 pages

Lecture

Lecture

6 pages

lecture

lecture

17 pages

Lecture 3

Lecture 3

12 pages

Lecture

Lecture

17 pages

Lecture

Lecture

18 pages

lecture

lecture

14 pages

lecture

lecture

8 pages

lecture

lecture

5 pages

Lecture

Lecture

19 pages

lecture

lecture

10 pages

Lecture

Lecture

20 pages

Lecture

Lecture

8 pages

Lecture

Lecture

7 pages

lecture

lecture

59 pages

Lecture

Lecture

10 pages

Task 2

Task 2

2 pages

Handout

Handout

18 pages

Load more
Download A Comparkon of Full and Partial Predicated Execution Support for ILP Processors
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A Comparkon of Full and Partial Predicated Execution Support for ILP Processors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Comparkon of Full and Partial Predicated Execution Support for ILP Processors 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?