Unformatted text preview:

G22.2243-001High Performance Computer ArchitectureLecture 7Compiling for VLIW/EPIC ProcessorsMemory SystemMarch 1, 20063/4/2006 2Outline• Announcements– Final Exam: Wednesday, May 3 5:00 - 6:50pm– Lab Assignment 2 due back today; deadline extended to next week– HW Assignment 3 out today. Due next week: March 8• Last lecture: – Tomasulo’s algorithm– Multiple-issue processors (achieving IPC > 1)• Superscalar processors• Brief mention of VLIW processors • VLIW processors– Software techniques– Hardware support• Memory System [ Hennessy/Patterson CA:AQA (3rd Edition): parts of Chapter 4, Chapter 5 ]3/4/2006 3Architectural Features in VLIW Processors• VLIW processors rely on the compiler to identify a packet of instructions that can be issued in the same cycle– Compiler takes responsibility for scheduling instructions so that their dependences are satisfied• Optimizations such as loop unrolling, and software pipelining expose more ILP, allowing the compiler to build issue packets• Architectural support helps compiler expose/exploit more ILPr1 = L r4 r2 = Add r1,M f1 = Mul f1,f2 r5 = Add r5,43/4/2006 4Basic Compiler Techniques (S1): Loop Unrolling(Recap)• Consider the example from last week:for (i=1000; i>0; i--)x[i] = x[i] + sL1: L.D F0, 0(R1)ADD.D F4, F0, F2S.D F4, 0(R1)DADDUI R1, R1, #-8BNE R1, R2, L110stall9BNE R1, R2, L18stall7DADDUI R1, R1, #-86S.D F4, 0(R1)5stall4stall3ADD.D F4, F0, F22stall1L.D F0, 0(R1)L1Issue CycleInstruction3 cycles3/4/2006 5Basic Compiler Techniques: Loop Unrolling (cont’d)• Loop unrolling optimization: Replicate loop body multiple times, adjusting the loop termination codeL1: L.D F0, 0(R1)ADD.D F4, F0, F2S.D F4, 0(R1)L.D F6, -8(R1)ADD.D F8, F6, F2S.D F8, -8(R1)L.D F10, -16(R1)ADD.D F12, F10, F2S.D F12, -16(R1)L.D F14, -24(R1)ADD.D F16, F14, F2S.D F16, -24(R1)DADDUI R1, R1, #-32BNE R1, R2, L114S.D F16, 8(R1)13BNE R1, R2, L112S.D F12, 16(R1)11DADDUI R1, R1, #-3210S.D F4, -8(R1)9S.D F4, 0(R1)8ADD.D F16, F14, F27ADD.D F12, F10, F26ADD.D F8, F6, F25ADD.D F4, F0, F24L.D F14, -24(R1)3L.D F10, -16(R1)2L.D F6, -8(R1)1L.D F0, 0(R1)L1Issue CycleInstruction3/4/2006 6Basic Compiler Techniques: Loop Unrolling(cont’d)• Unroll loop 5 timesL1: L.D F0, 0(R1)ADD.D F4, F0, F2S.D F4, 0(R1)L.D F6, -8(R1)ADD.D F8, F6, F2S.D F8, -8(R1)L.D F10, -16(R1)ADD.D F12, F10, F2S.D F12, -16(R1)L.D F14, -24(R1)ADD.D F16, F14, F2S.D F16, -24(R1)L.D F18, -32(R1)ADD.D F20, F18, F2S.D F20, -32(R1)DADDUI R1, R1, #-40BNE R1, R2, L1ADD.D F20, F18, F2ADD.D F16, F14, F2ADD.D F12, F10, F2ADD.D F8, F6, F2ADD.D F4, F0, F2FP Instruction12S.D F20, 8(R1)11BNE R1, R2, L110S.D F16, 16(R1)9DADDUI R1, R1, #-408S.D F12, -16(R1)7S.D F4, -8(R1)6S.D F4, 0(R1)5L.D F18, -32(R1)4L.D F14, -24(R1)3L.D F10, -16(R1)2L.D F6, -8(R1)1L.D F0, 0(R1)L1Integer InstructionProvide instructions for VLIW3/4/2006 7Hardware Support for VLIW• To expose more parallelism at compile time– Conditional or predicated instructions• Predication registers in IA64– Allow the compiler to group instructions across branches• To allow compiler to speculate, while ensuring program correctness– Result of speculated instruction will not be used in final computation if mispredicted– Speculative movement of instructions (before branches, reordering of loads/stores) must not cause exceptions• HW allows exceptions from speculative instructions to be ignored– Poison bits and Reorder Buffers– HW tracks memory dependences between loads and stores• LDS (speculative load) and LDV (load verify) instructions– Check for intervening store• Variant: LDV instruction can point to fix-up code3/4/2006 8HW Support for Speculative Operations (H1) • Speculative operations in HPL-PD architecture from HP Labs written identically to their non-speculative counterparts, but with an “E” appended to the operation name.– E.g., DIVE, ADDE, PBRREPoison bits: If an exceptional condition occurs during a speculative operation, the exception is not raised– A bit is set in the result register to indicate that such a condition occurred– Speculative bits are simply propagated by speculative instructions– When a non-speculative operation encounters a register with the speculative bit set, an exception is raised3/4/2006 9(H1) Compiler Use of Speculative Operations• Here is an optimization that uses speculative instructions:– Also the effect of the DIV latency is reduced – If a divide-by-zero occurs, an exception will be raised by ADD. . .v1 = DIV v1,v2v3 = ADD v1,5. . .. . .. . .. . .v3 = ADD v1,5. . .. . .v1 = DIVE v1,v2. . .. . .3/4/2006 10HW Support for Predication (H2) • Conditional or predicated instructions– Instruction is “conditionally” executed, else no-op– Originally: a separate set of (simple) instructions– Now: more general support• In HPL-PD, most operations can be predicated– they can have an extra operand that is a one-bit predicate register.r2 = ADD r1,r3 if p2– If the predicate register contains 0, the operation is not performed– The values of predicate registers are typically set by “compare-to-predicate” operationsp1 = CMPP<= r4,r53/4/2006 11Compiler Uses of Predication• if-conversion • To aid code motion by instruction scheduler– e.g. hyperblocks3/4/2006 12Uses of Predication: If-conversion• If-conversion replaces conditional branches with predicated operations• For example, the code generated for:if (a < b)c = a;elsec = b;if (d < e)f = d;elsef = e;might be the two VLIW instructions:P1 = CMPP.< a,b P2 = CMPP.>= a,b P3 = CMPP.< d,e P4 = CMPP.>= d,ec = a if p1 c = b if p2 f = d if p3 f = e if p43/4/2006 13Compare-to-predicate instructions• In previous slide, there were two pairs of almost identical instructions– just computing complement of each other• HPL-PD provides two-output CMPP instructionsp1,p2 = CMPP.W.<.UN.UC r1,r23/4/2006 14(H2) If-conversion, revisited• Using two-output CMPP instructions, the code generated for:if (a < b)c = a;elsec = b;if (d < e)f = d;elsef = e;might instead be:p1,p2 = CMPP.W.<.UN.UC a,b p3,p4 = CMPP.W.<.UN.UC d,ec = a if p1 c = b if p2 f = d if p3 f = e if p4Only two CMPP operations,occupying less of the VLIWinstruction.3/4/2006 15}Uses of Predication: Hyperblock Formation• In hyperblock formation, if-conversion is used to form larger blocks of operations than the usual basic blocks– tail duplication used to remove some incoming edges in middle of block–


View Full Document

NYU CSCI-GA 2243 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?