Unformatted text preview:

Peephole Optimization & Other Post-Compilation Techniques COMP 512 Rice University Houston, Texas Spring 2009 Copyright 2009, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. COMP 512, Spring 2009! 2!The Problem After compilation, the code still has some flaws •! Scheduling & allocation really are NP-Complete •! Optimizer may not implement every needed transformation Curing the problem •! More work on scheduling and allocation •! Implement more optimizations — or — •! Optimize after compilation >! Peephole optimization >! Link-time optimizationCOMP 512, Spring 2009! 3!Post-compilation Optimization Field has attracted much attention lately •! New & interesting opportunities exist at link time & at run time >! Interprocedural analysis & optimization >! Resolved references & data-structure sizes >! Code that is out of scope at other times, such as linkage code >! Optimize object-only distributions of libraries & applications •! Hard problems in this arena >! Reconstructing control flow COMP 512, Spring 2009! 4!Unravelling Control-flow (TI TMS320C6x) B .S1 LOOP ; branch to loop B .S1 LOOP ; branch to loop B .S1 LOOP ; branch to loop B .S1 LOOP ; branch to loop || ZERO .L1 A2 ; zero A side product || ZERO .L2 B2 ; zero B side product B .S1 LOOP ; branch to loop || ZERO .L1 A3 ; zero A side accumulator || ZERO .L2 B3 ; zero B side accumulator || ZERO .D1 A1 ; zero A side load value || ZERO .D2 B1 ; zero B side load value LOOP: LDW .D1 *A4++, A1 ; load a[i] \& a[i+1] || LDW .D2 *B4++, B1 ; load a[i] \& a[i+1] || MPY .M1X A1, B1, A2 ; load b[i] \& b[i+1] || MPYH .M2X A1, B1, B2 ; a[i] * b[i] || ADD .L1 A2, A3, A3 ; a[i+1] * b[i+1] || ADD .L2 B2, B3, B3 ; ca += a[i] * b[i] || [B0] SUB .S2 B0, 1, B0 ; decrement loop counter || [B0] B .S1 LOOP ; branch to loop ADD .L1X A3, B3, A3 \>; c = ca + cb Set up the loop + 1 more branch Single cycle loop ending with another branch Stuff 4 branches into the pipeCOMP 512, Spring 2009! 5!Unravelling Control-flow (TI TMS320C6x) B .S1 LOOP ; branch to loop B .S1 LOOP ; branch to loop B .S1 LOOP ; branch to loop B .S1 LOOP ; branch to loop || ZERO .L1 A2 ; zero A side product || ZERO .L2 B2 ; zero B side product B .S1 LOOP ; branch to loop || ZERO .L1 A3 ; zero A side accumulator || ZERO .L2 B3 ; zero B side accumulator || ZERO .D1 A1 ; zero A side load value || ZERO .D2 B1 ; zero B side load value LOOP: LDW .D1 *A4++, A1 ; load a[i] \& a[i+1] || LDW .D2 *B4++, B1 ; load a[i] \& a[i+1] || MPY .M1X A1, B1, A2 ; load b[i] \& b[i+1] || MPYH .M2X A1, B1, B2 ; a[i] * b[i] || ADD .L1 A2, A3, A3 ; a[i+1] * b[i+1] || ADD .L2 B2, B3, B3 ; ca += a[i] * b[i] || [B0] SUB .S2 B0, 1, B0 ; decrement loop counter || [B0] B .S1 LOOP ; branch to loop ADD .L1X A3, B3, A3 \>; c = ca + cb From this, the post-optimization translator is supposed to recover a simple source code loop … COMP 512, Spring 2009! 6!Post-compilation Optimization Field has attracted much attention lately •! New & interesting opportunities exist at link time & at run time >! Interprocedural analysis & optimization >! Resolved references & data-structure sizes >! Code that is out of scope at other times, such as linkage code >! Optimize object-only distributions of libraries & applications •! Hard problems in this arena >! Reconstructing control flow >! Reconstructing type & dimension information >! Analyzing pointers & addresses While whole-program techniques are expensive, at link-time the compiler must traverse all that code anyway.COMP 512, Spring 2009! 7!Potential Post-compilation Optimizations A mix of old and new techniques •! Peephole optimization •! Code positioning (& branch optimizations) •! Sinking (cross jumping) & hoisting •! Procedure abstraction (for space) •! Register reallocation (scavenging, Belady, coloring) •! Bit-transition reduction •! Dead & Clean •! Constant propagation •! Pointer disambiguation & register promotion COMP 512, Spring 2009! 8!Peephole Optimization The Basic Idea •! Discover local improvements by looking at a window on the code >! A tiny window is good enough — a peephole •! Slide the peephole over the code and examine the contents >! Pattern match with a limited set of patterns Examples storeAI r1 ! r0,8 loadAI r0,8 ! r15 storeAI r1 ! r0,8 i2i r1 ! r15 ! addI r2,0 ! r7 mult r4,r7 ! r10 mult r4,r2 ! r10 ! jumpI " l10 l10: jumpI " l11 jumpI " l11 l10: jumpI " l11 ! } Less likely as a local sequence, but other opts can create it …COMP 512, Spring 2009! 9!Peephole Optimization Early Peephole Optimizers (McKeeman) •! Used limited set of hand-coded patterns •! Matched with exhaustive search •! Small window, small pattern set ! quick execution •! They proved effective at cleaning up the rough edges >! Code generation is inherently local >! Boundaries between local regions are trouble spots Improvements in code generation, optimization, & architecture should have let these fade into obscurity >! Much better allocation & scheduling today than in 1965 >! But, we have much more complex architectures Window of 2 to 5 ops!COMP 512, Spring 2009! 10!Peephole Optimization Modern Peephole Optimizers (Davidson, Fraser) •! Larger, more complex ISAs ! larger pattern sets •! This has produced a more systematic approach Expander •! Operation-by-operation expansion into LLIR •! Needs no context •! Captures full effect of an operation Expander ASM"LLIR Simplifier LLIR"LLIR Matcher LLIR"ASM ASM LLIR ASM LLIR add r1,r2 ! r4 r4 # r1 + r2 cc # f(r1 + r2) !COMP 512, Spring 2009! 11!Peephole Optimization Modern Peephole Optimizers (Davidson, Fraser) •! Larger, more complex ISAs ! larger pattern sets •! This has produced a more systematic approach Simplifier •! Single pass over LLIR, moving the peephole •! Forward substitution, algebraic simplification, constant folding, & eliminating useless effects (must know what is dead ) •! Eliminate as many LLIR operations as possible Expander ASM"LLIR


View Full Document

Rice COMP 512 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?