New version page

Rice COMP 512 - Lecture Notes

This preview shows page 1-2-3 out of 8 pages.

View Full Document
View Full Document

End of preview. Want to read all 8 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

ENGINEERING A COMPILERDraft of Second EditionKeith D. CooperLinda TorczonRice UniversityHoust on, TexasLimited Copies DistributedReproduction requires explicit permissionCopyright 2008, Morgan-Kaufmann Publishers and the authorsAll rights reserved8.6 Global Optimization 473the compiler must insert when building the SSA form of a program.!The compiler can use live information t o discover useless store operations.At an operation that stores v to memory, if v is not live then the store is useles.This simple technique works w ell for unambiguou s scalar variables—that is,variables known by only one name.In different contexts, liveness is calculated for different notions a variable or a name.For example, i n a compiler that ope rates on anILOC-like IR, liveness might be cal-culated onILOC register names.8.6. 2 Global Code PlacementOn many modern computers, the logical layout of the final code has an effect onthe program’s running time . The compiler has a large degree of control ove r theplacement of code in the memory system—at least, over the relative placementof code. If the compiler has knowledge about the expected relative execution fre-quencies of the blocks and pr ocedures, it can select a code layout that improvesruntime performance.On machines w ith unified code and data memory, fetched instructions perturblocality throughout the memory hierarchy . Instruction fetch activity can affect thevirtual-memory working set. With a combined instruction and data cache, instr uc-tion fetch activity can interfere with data locality.Many processors have asymetric branch costs—that is, the cost of a fall-throughfall-through branch: A oneaddress branch has twooutcomes. Either thebranch is taken, orexecution falls through tothe next operation insequence. These paths arecalled, respectively, thetaken branch and thefall-through branch.branch is less than the cos t of an explicitly taken branch. Each branch has two suc-cessor basic b loc ks; the compiler can choose which b loc k lies on the fall-throughpath and which lie s on t he taken path. With diff erent costs f or the two paths, thatchoice can affect runtime performance.In this transformation, the compiler reorders the basic blocks inside a proce-dure. It follows two principles in building that order: (1) make the fall-throughcase the likely execution path for each branch, and (2) move code that executesinfrequently to the end of the procedure. Taken together, these principles can pro-duce longer sequences that execute without a disruptive (e.g., taken) branch. That,in turn, should produce more efficient cache use for the instruction stream: feweroperations fetched but not executed and fewer cache misses.The code placement algor ithm relies, inherently, on the observation that somebranches have extremely lopsi ded behavior, the same effect target ed by hardwarebranch prediction. Consider theCFG fragment shown in the margin. (B0, B2) exe-474 CHAPTER 8 Introduction to Code Optimizationcutes 100 t imes more often th an (B0, B1). With asymetric branch costs, the com-B0!"1#$100B1#$1B2!"100B3piler should arrange the code so that (B0, B2) uses the less expensive branch. If(B0, B1) and (B0, B2) had roughly equal execution f requencies, then block place-ment would have little impact for th i s code.Two different layouts for thi s code are shown to the left. The layout labelled“Slow uses the fall-through branch to implement (B0, B1); the more frequent branchuses the taken side of the branch. The layout labelled “Fast” reverses this decision.If t he fall-through branch is faster than the taken branch, this layout uses the fasterB0B1B3...B2100!"%100"!%B0B2B3...B11!"%1"!%Slow Fastcase 100 t imes more often.Code placement, like most optimizations at the global scope, has separate anal-ysis and transformation phases. The analysis phase must gather estimates of eachbranch’s relative execution frequency and then use that information to construct aset of frequently executed paths in theCFG. For clarity, we will discuss the analy-sis as two distinct steps. Given the analysis results, the compiler must then reorderthe blocks so that frequently executed branches are fall-through branches. At thesame time, it can move unlikely cases to a relatively distant location with the hopeof improving cach e and paging behavior.Obtaining Profile D ataFor global code placement, the compiler needs estimates of the execution frequencyof e ach edge in the procedure’sCFG. It can obtain that information from a profilingrun of the code: compile the entire program, run it u nder a profiling tool on rep-resentative data, and give the compiler access to the resulting profile data. It canobtain that i nformati o n from a model of program execut ion; such models rangefrom simple t o rather elaborate, with a range of accuracies.&10B0!"7#$3B1!"5#$2B2!"3B3#$5B4!"B5&510Example CFGFor this transformation, the compiler needs execution counts for the CFG edg es,which correspond to branches and jumps. TheCFG in the margin illustrates thereason that edge c ounts are superior to block co unts for code placement. From theexecution cou nts, shown as labels on the edges, we see that blocks B0and B5eachexecute ten times. The path (B0, B1, B3, B5) executes more than any other path inthisCFG fragment. The edge counts suggest, for example, that making the branch(B1, B3) the fall-through case is better than making it the taken case. Relying onblock counts, however, the compiler would deduce that blocks B4and B5are ofequal impo rtance; it mig ht well choose the less important edge, (B1, B4), as thefall-through case. The code placement algorithm u ses profile data t o rank theCFGedges by fr equency of exe cution. Thus, accur ate edge data has a direct effect on thequality of the results.8.6 Global Optimization 475GA THERING PROFILE DATAIf the compiler understands the relative execution frequencies of the var-ious parts of the program, it can use that information to improve the pro-gram’s performance. Profile data can play an important role in optimiza-tions such as global co de placeme nt (Section 8.6.2) or inline substitution(Section 8.7.1). Several approaches are used to gather profile data.!Instrumented executables In this s cheme, the compiler generatescode to count specific events , such as procedure entries and exits ortaken branches. At runt ime, the data is written to an external file andprocessed offline by another tool.!Timer interrupts Tools that use this approach


View Full Document
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?