NYU CSCI-GA 2243 - Compiling for VLIW/EPIC Processors

Unformatted text preview:

Outline Announcements Lab Assignment 1 due back today Lab Assignment 2 due back in two weeks October 31th HW Assignment 3 out today Due next week March 8 G22 2243 001 High Performance Computer Architecture Last lecture Tomasulo s algorithm Multiple issue processors achieving IPC 1 Lecture 7 Compiling for VLIW EPIC Processors Memory System Superscalar processors Brief mention of VLIW processors VLIW processors Software techniques Hardware support October 17 2007 Memory System Hennessy Patterson CA AQA 4th Edition parts of Chapters 3 and 5 10 17 2007 2 Architectural Features in VLIW Processors Basic Compiler Techniques S1 Loop Unrolling VLIW processors rely on the compiler to identify a packet of instructions that can be issued in the same cycle Recap Consider the example from last week Compiler takes responsibility for scheduling instructions so that their dependences are satisfied r1 L r4 r2 Add r1 M f1 Mul f1 f2 for i 1000 i 0 i x i x i s r5 Add r5 4 L1 L1 L D ADD D S D DADDUI BNE Optimizations such as loop unrolling and software pipelining expose more ILP allowing the compiler to build issue packets Architectural support helps compiler expose exploit more ILP 10 17 2007 3 F0 F4 F4 R1 R1 0 R1 F0 F2 0 R1 R1 8 R2 L1 Instruction Issue Cycle L D 1 F0 0 R1 stall 2 ADD D F4 F0 F2 3 stall 4 3 cycles stall 5 S D F4 0 R1 6 DADDUI R1 R1 8 7 stall 8 BNE R1 R2 L1 9 stall 10 10 17 2007 4 Basic Compiler Techniques Loop Unrolling cont d Basic Compiler Techniques Loop Unrolling cont d Loop unrolling optimization Replicate loop body multiple times adjusting the loop termination code Unroll loop 5 times L1 L D ADD D S D L D ADD D S D L D ADD D S D L D ADD D S D DADDUI BNE 10 17 2007 F0 0 R1 F4 F0 F2 F4 0 R1 F6 8 R1 F8 F6 F2 F8 8 R1 F10 16 R1 F12 F10 F2 F12 16 R1 F14 24 R1 F16 F14 F2 F16 24 R1 R1 R1 32 R1 R2 L1 Instruction L1 L D F0 0 R1 L D F6 8 R1 L D F10 16 R1 L D F14 24 R1 ADD D F4 F0 F2 ADD D F8 F6 F2 ADD D F12 F10 F2 ADD D F16 F14 F2 S D F4 0 R1 S D F4 8 R1 DADDUI R1 R1 32 S D F12 16 R1 BNE R1 R2 L1 S D F16 8 R1 Issue Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 5 L1 L D ADD D S D L D ADD D S D L D ADD D S D L D ADD D S D L D ADD D S D DADDUI BNE F0 0 R1 F4 F0 F2 F4 0 R1 F6 8 R1 F8 F6 F2 F8 8 R1 F10 16 R1 F12 F10 F2 F12 16 R1 F14 24 R1 F16 F14 F2 F16 24 R1 F18 32 R1 F20 F18 F2 F20 32 R1 R1 R1 40 R1 R2 L1 Integer Instruction FP Instruction L1 L D F0 0 R1 1 L D F6 8 R1 2 L D F10 16 R1 ADD D F4 F0 F2 3 L D F14 24 R1 ADD D F8 F6 F2 4 L D F18 32 R1 ADD D F12 F10 F2 5 S D F4 0 R1 ADD D F16 F14 F2 6 S D F4 8 R1 ADD D F20 F18 F2 7 S D F12 16 R1 8 DADDUI R1 R1 40 9 S D F16 16 R1 10 BNE R1 R2 L1 11 S D F20 8 R1 12 Provide instructions for VLIW 10 17 2007 6 1 Hardware Support for VLIW HW Support for Speculative Operations H1 To expose more parallelism at compile time Speculative operations in HPL PD architecture from HP Labs written identically to their non speculative counterparts but with an E appended to the operation name Conditional or predicated instructions Predication registers in IA64 E g DIVE ADDE PBRRE Allow the compiler to group instructions across branches To allow compiler to speculate while ensuring program correctness Poison bits If an exceptional condition occurs during a speculative operation the exception is not raised Result of speculated instruction will not be used in final computation if mispredicted Speculative movement of instructions before branches reordering of loads stores must not cause exceptions A bit is set in the result register to indicate that such a condition occurred Speculative bits are simply propagated by speculative instructions When a non speculative operation encounters a register with the speculative bit set an exception is raised HW allows exceptions from speculative instructions to be ignored Poison bits and Reorder Buffers HW tracks memory dependences between loads and stores LDS speculative load and LDV load verify instructions Check for intervening store Variant LDV instruction can point to fix up code 10 17 2007 7 10 17 2007 8 H1 Compiler Use of Speculative Operations HW Support for Predication H2 Here is an optimization that uses speculative instructions Conditional or predicated instructions Instruction is conditionally executed else no op Originally a separate set of simple instructions Now more general support v1 DIVE v1 v2 In HPL PD most operations can be predicated v1 DIV v1 v2 v3 ADD v1 5 they can have an extra operand that is a one bit predicate register r2 ADD r1 r3 if p2 v3 ADD v1 5 If the predicate register contains 0 the operation is not performed The values of predicate registers are typically set by compare topredicate operations p1 CMPP r4 r5 Also the effect of the DIV latency is reduced If a divide by zero occurs an exception will be raised by ADD 10 17 2007 9 10 17 2007 10 Compiler Uses of Predication Uses of Predication If conversion if conversion If conversion replaces conditional branches with predicated operations For example the code generated for To aid code motion by instruction scheduler e g hyperblocks if a c else c if d f else f b a b e d e might be the two VLIW instructions P1 CMPP a b P2 CMPP a b P3 CMPP d e P4 CMPP d e c a 10 17 2007 11 if p1 10 17 2007 c b if p2 f d if p3 f e if p4 12 2 Compare to predicate instructions H2 If conversion revisited In previous slide there were two pairs of almost identical instructions Using two output CMPP instructions the code generated for just computing complement of each other if a b c a else c b if d e f d else f e HPL PD provides two output CMPP instructions p1 p2 CMPP W UN UC r1 r2 Only two CMPP operations occupying less of the VLIW instruction might instead be p1 p2 c a 10 17 2007 13 CMPP W UN UC a b p3 p4 if p1 c b if p2 CMPP W UN UC d …


View Full Document

NYU CSCI-GA 2243 - Compiling for VLIW/EPIC Processors

Download Compiling for VLIW/EPIC Processors
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Compiling for VLIW/EPIC Processors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Compiling for VLIW/EPIC Processors 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?