Inline ExpansionInline Expansionldots Inline Expansionldots Inline Expansionldots Inline Expansionldots Inline Expansionldots Inline Expansionldots Inline Expansionldots AlgorithmAlgorithmldots Algorithmldots Algorithmldots Topological OrderTopological Orderldots Inlining Example (Original)Inlining Example (Expanded)Inlining Example (Optimized)Inlining MethodsInlining Methodsldots Inlining Methods --- ExampleType Hierarchy AnalysisType Hierarchy Analysisldots C PreprocessorC Preprocessorldots C Preprocessor ExamplesC Preprocessor Examplesldots Macros vs. Inlined ProceduresMacros vs. Inlined Proceduresldots Readings and References520—Spring 2005—32CSc 520Principles of ProgrammingLanguages32: Procedures — InliningChristian [email protected] of Computer ScienceUniversity of ArizonaCopyrightc 2005 Christian Collberg[1]520—Spring 2005—32Inline ExpansionThe most important and popular inter-proceduraloptimization is inline expansion, that is replacing the callof a procedure with the procedure’s body.Why would you want to perform inlining? There areseveral reasons:1. There are a number of things that happen when aprocedure call is made:(a) evaluate the arguments of the call,(b) push the arguments onto the stack or move themto argument transfer registers,(c) save registers that contain live values and thatmight be trashed by the called routine,(d) make the jump to the called routine,[2]520—Spring 2005—32Inline Expansion...1. continued...(e) make the jump to the called routine,(f) set up an activation record,(g) execute the body of the called routine,(h) return back to the callee, possibly returning aresult,(i) deallocate the activation record.2. Many of these actions don’t have to be performed ifwe inline the callee in the caller, and hence much ofthe overhead associated with procedure calls isoptimized away.3. More importantly, programs written in modernimperative and OO-languages tend to be so litteredwith procedure/method calls. ...[3]520—Spring 2005—32Inline Expansion...3. ... This is the result of programming with abstractdata types. Hence, there is often very littleopportunity for optimization. However, when inliningis performed on a sequence of procedure calls, thecode from the bodies of several procedures iscombined, opening up a larger scope foroptimization.There are problems, of course. Obviously, in mostcases the size of the procedure call code will be lessthan the size of the callee’s body’s code. So, the size ofthe program will increase as calls are expanded.[4]520—Spring 2005—32Inline Expansion...A larger executable takes longer to load from secondarystorage and may affect the paging behavior of themachine. More importantly, a routine (or an inner loopof a routine) that fits within the instruction-cache beforeexpansion, might be too large for the cache afterexpansion.Also,larger procedures need more registers (the registerpressureis higher) than small ones. If, after expansion,a procedure is so large (or contains such complicatedexpressions) that the number of registers provided bythe architecture is not enough, then spill code will haveto be inserted when we run out of registers.[5]520—Spring 2005—32Inline Expansion...Several questions remain. Which procedures should beinlined? Some languages (C++, Ada, Modula-3) allowthe user to specify (through a keyword or pragma) theprocedures that should be eligible for expansions.However, this implies that a given procedure shouldalways be expanded, regardless of the environment inwhich it is called. This may not be the best thing to do.For example, we might consider inlining a call toPinside a tightly nested inner loop, but choose not to doso in a module initialization code that is only executedonce.[6]520—Spring 2005—32Inline Expansion...Some compilers don’t rely on the user to decide onwhat should be inlined. Instead they use1. A static heuristic, such as “procedures which are(a) shorter than 10 lines and have fewer than 5parameters or(b) are leaf routines (i.e. don’t make any callsthemselves)are candidates for inlining”.2. A heuristic based on profiling. After running theprogram through a profiler we know how many timeseach procedure is likely to be called from each callsite. Only inline small, frequently called procedures.[7]520—Spring 2005—32Inline Expansion...How do we inline across module boundaries? We needaccess to the code of the called procedure. If theprocedure is declared in a separately compiled module,this code isnot available. What do we do? Goodquestion...What’s the difference between inlining and macroexpansion? Inlining is performed after semanticanalysis, macro expansion before.[8]520—Spring 2005—32Inline Expansion...At which level do we perform the inlining?intermediate code Most common.source code Some source-to-source translators performinlining.assembly code Doable (with some compilercooperation), but unusual.[9]520—Spring 2005—32Algorithm1. Build the call graph:(a) Create an empty directed graph G.(b) Add a node for each routine and for the mainprogram.(c) If procedureP calls procedure Q then insert adirected edge P → Q.P RSTMainQV[10]520—Spring 2005—32Algorithm...G is actually a multigraph since a procedure mightmake multiple calls to the same procedure.Beware of indirect calls through procedure parametersor variables, as well as method invocations![11]520—Spring 2005—32Algorithm...2. Pick routines to inline. Possible heuristics:(a) Discard recursive routines (Perform a topologicalsort of the call graph. Cycles indicate recursion.) orjust inline them one or two levels deep.(b) Select routines with indegree=1.(c) Select calls to small routines in inner loops.(d) Rely on user-definedINLINE pragmas.(e) Use profiling information.(f) Consider effects on caching, paging, registerpressure, total code size, ...(g) ...[12]520—Spring 2005—32Algorithm...3. FOR each call P (a1, · · · , an) in Q to inline procedureP (f1, · · · , fn), in reverse topological order of the callgraphDO(a) Replace the call P (a1, · · · , an) with P ’s body.(b) Replace references to call-by-reference formalfkwith a reference to the corresponding actualparameter ak.(c) For each call-by-value formal parameterfkcreate anew local ck. Insert code to copy the call-by-valueactual akinto ck. Replace references to thecall-by-value formal fkwith a reference to its copy ck.(d) For each ofP ’s local variables lkcreate a new localvkin Q.
View Full Document