Unformatted text preview:

Inline Substitution (the poster child for interprocedural optimization) COMP 512 Rice University Houston, Texas Fall 2008 Copyright 2008, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. Lab 0 is available on the web site, due 1/26/2009. Last Lecture Examples of Global Analysis and Transformation •! Computing live variables >! Classic backwards global data-flow problem >! Used in SSA construction, in register allocation •! Using live information to eliminate useless stores >! Simple demonstration of the use of LIVE •! Single-procedure block placement algorithm (Pettis & Hansen) >! Arrange the blocks to maximize fall-through branches >! Improves code locality as a natural consequence COMP 512, Fall 2008! 2!This Lecture Optimizing across procedure boundaries •! Managing interprocedural compilation >! Access to source code >! Managing consequences of interprocedural optimization •! Inline substitution as an example >! Transformation >! Decision procedure This lecture began as a general introduction to working at the interprocedural scope, but it grew into a lecture focused on inline substitution. Sometimes that happens. COMP 512, Fall 2008! 3!Example Transformations COMP 512, Fall 2008! 4!Scope Name Analysis Effect Local LVN Incremental Redundancy, constants, & identities Balancing LIVE info. Enhance ILP Regional Superlocal VN CFG, EBBs Redundancy, constants, & identities Dominator VN CFG, DOM info. Global Dead store elim. LIVE info. Eliminate dead store Block placement CFG, Profiles Code locality & branch straightening Interprocedural Inline subs’n Need call graph Eliminate overhead Allow opt’n in context Proc. placement Call graph, profile data Better localityInterprocedural Optimization Global optimization catches many inefficiencies •! Some opportunities arise from procedure linkages >! Parameter binding, branches, register save/restore code >! Indirect calls (function parameters, methods in OOLs) •! Some single-procedure opportunities are disrupted by calls >! Stops propagation of constants (parameters & globals) >! Register save & restore •! Interprocedural analysis >! Analyze all the code that is available & use the results to improve the code •! Interprocedural transformations >! Transformations that involve code in two or more procedures COMP 512, Fall 2008! 5!COMP 512, Fall 2008! 6!Inline Substitution Replace a procedure call with the body of the called procedure •! Textual substitution to create effects of parameter binding •! Private copy of code can be tailored to call site’s context >! Constants, unambiguous pointers, aliases, … •! Eliminates disruption of procedure call >! Register save & restore >! Disruption of call & return •! Eliminates benefits of procedure call >! Call resets state of register allocator >! Procedure abstraction keeps name space small •! Usually assumed to be profitable, although studies disagree … The oldest interprocedural optimization [Ershov 1966] Relate also to OOLs, with their sky high ratio of overhead to work and the difficulty of converting virtual calls to concrete calls. Inlining one call can reveal the relevant class for another … Some authors report major degradation from code-cache blowout.COMP 512, Fall 2008! 7!Inline Substitution Example call foe fie call foe fee foe COMP 512, Fall 2008! 8!Inline Substitution — After inlining foe fie foe fee foe Potential for exponential growth …Inline Substitution Potential for exponential growth COMP 512, Fall 2008! 9!L1 L2 L0 Complete inlining would create 9 copies of each of the “leaf” routines, L0, L1, L2 Inline Substitution The transformation is easy •! Rewrite the call site with the callee’s body •! Rewrite formal parameter names with actual parameter names The decision procedure is quite hard •! Overall process is resource constrained >! Experience suggests register demand is important >! Code size (whole program & current procedure) play a role !! Excessive code growth leads to excessive compilation time •! At a given call site, profitability depends on the extent to which the callee can be tailored to the specific context >! Performance can improve or degrade •! Each decision affects profitability & resource use of others COMP 512, Fall 2008! 10!Inline Substitution Choosing which call sites to inline is hard •! Performance of transformed code is hard to predict >!Let’s take a slight detour to talk about profitability COMP 512, Fall 2008! 11!COMP 512, Fall 2008! 12!A quick look at real compilers The study •! Eight programs, five compilers, five processors •! Eliminated over 99% of dynamic calls in 5 of programs •! Measured speed of original versus transformed code •! We expected uniform speed up, at least from call overhead •! What really happened? Source Program Compiler Inliner Compiler Execute & time Experimental Setup Execute & time Five good compilers!COMP 512, Fall 2008! 13!A quick look at real compilers 0.500.600.700.800.901.001.101.201.30vortex shal64 efie304 wanal1 wave euler cedeta linpackdP r o g r a m% Impr ovement3081MIPSSequentConvexStardentDo you see a pattern in this data? COMP 512, Fall 2008! 14!A quick look at real compilers And this happened with good compilers! What happened? •! Input code violated assumptions made by compiler writers >! Longer procedures >! More names >! Different code shapes •! Exacerbated problems that are unimportant on “normal” code >! Imprecise analysis >! Algorithms that scale poorly >! Tradeoffs between global and local speed >! Limitations in the implementations The compiler writers were surprised (most of them) Randy Scarborough (Fortran H) assigned a summer intern to figure out what went wrong so he could fix it.COMP 512, Fall 2008! 15!A quick look at real compilers One standout story •! MIPS M120/5, 16 MB of memory •! Running standalone, wanal1 took > 95 hours to compile >! Original code, not the transformed code >! 1,252 lines of Fortran (not a large program) >! COMP 512 met twice during the compilation •! Running standalone with 48 MB of memory, it took < 9 minutes •! The compiler swapped for over 95 hours !?! •! For several years, wanal1 was a popular benchmark >! Compiler writers


View Full Document

Rice COMP 512 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?