DOC PREVIEW
CSU CS 553 - Profile-Guided Optimization

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS553 Lecture Profile-Guided Optimizations 2Profile-Guided Optimizations Recall– Instruction scheduling– List scheduling– Register renaming– Loop unrolling– Software pipelining– Alias analysis– how can we use alias analysis for instruction scheduling?– what causes conservative results? Today– More instruction scheduling– Profiling– Trace schedulingCS553 Lecture Profile-Guided Optimizations 3Motivation for Profiling Limitations of static analysis– Compilers can analyze possible paths but must behave conservatively– Frequency information cannot be obtained through static analysis How runtime information helps– Control flow information Optimize the more frequent path (perhaps at the expense of the less frequent path)− Memory conflictsif c90%10%st r1,0(r5)ld r2,0(r4)If r5 and r4 always have different values,we can move the load above the storeCS553 Lecture Profile-Guided Optimizations 4Profile-Guided Optimizations Basic idea– Instrument and run program on sample inputs to get likely runtimebehavior– Can use this information to improve instruction scheduling– Many other uses– Code placement– Inlining– Value speculation– Branch prediction– Class-based optimization (static method lookup)CS553 Lecture Profile-Guided Optimizations 5Profiling Issues Profile data– Collected over whole program run– May not be useful (unbiased branches)– May not reflect all runs– May be expensive and inconvenient to gather– Continuous profiling [Anderson 97]– May interfere with program2CS553 Lecture Profile-Guided Optimizations 6Control-Flow Profiles Commonly gather two types of information– Execution frequencies of basic blocks– Branch frequencies of conditional branches– Represent information in a weighted flow graph Instrumentation– Insert instrumentation code at basic block entrances and before eachbranch– Take average of values from multiple training runs– Fairly inexpensive123 410010070 30execution frequencies3070branch frequenciesCS553 Lecture Profile-Guided Optimizations 7− If we want to move s1 to A, we must moves1 to both A and BCode Motion Using Control Flow Profiles Code motion across basic blocks– Increased scheduling freedomCBAs1move code above a join− If we want to move s1 to B, we mustmove s1 to both B and CBACs1move code below a splitCS553 Lecture Profile-Guided Optimizations 8Code Motion Using Control Flow Profiles (cont) Code motion across basic blocks– Increased scheduling freedom− If we want to move s1 from B to A and if s1would destroy a value along the A→C path,do renamingBACs1move code above a split− What if s1 might cause an exception?CBA s1move code below a joinCBAs1tail duplication prevents B→C from seeing s1C′CS553 Lecture Profile-Guided Optimizations 9Memory-Dependence Profiles Gather information about memory conflicts– Frequencies of address matches between pairs of loads and stores– Attempts to answer the question: Are two references independent of oneanother?– Concentrate on ambiguous reference pairs (those that the compiler cannotfigure out) Instrumentation– Much more expensive (in both space and time) to gather than control flowinformation– First perform control flow profiling– Apply only to most frequently executed blocksstore r5load r4st1:ld2:(st1, ld2, 7)If this number is low, we canspeculatively assume that st1and ld2 do not conflict3CS553 Lecture Profile-Guided Optimizations 10Trace Scheduling [Fisher 81] and [Ellis 85] Basic idea– We want large blocks to create large scheduling windows, but basicblocks are small because branches are frequent– Create superblocks to increase scheduling window– Use profile information to create good superblocks– Optimize each superblock independently Superblocks– A sequence of basic blocks with a single entrance and multiple exits Goals– Want large superblocks– Want to avoid early exits– Want blocks that match actual execution pathsa superblock123 4CS553 Lecture Profile-Guided Optimizations 11Trace Scheduling (example)b[i] = “old”a[i] = ...if (a[i]>0) then b[i]=“new”;else stmt X stmt Yendifc[i] = ...trace:b[i] = “old”a[i] = ...b[i]=“new”;c[i] = ...if (a[i]<=0) then goto repaircontinue:...repair:restore old b[i]stmt Xstmt Yrecalculate c[i]?goto continueCS553 Lecture Profile-Guided Optimizations 12− A trace is a maximal sequence of mutual-most-likely blocks that does not contain a backedge− Each block belongs to exactly one traceTrace Scheduling (cont) Three steps1. Create superblocks2. Enlarge superblocks3. Compact (optimize) superblocks 1. Superblock formation− Create traces using mutual-most-likely heuristic(two blocks A and B are mutual-most-likely if B is the most likelysuccessor of A, and A is the most likely predecessor of B)ABEC307010D7010CS553 Lecture Profile-Guided Optimizations 13Trace Scheduling (cont) 1. Superblock formation (cont)– Convert traces into Superblocks– Use tail duplication to eliminate side entrances− Tail duplication increases code sizeABEC307070traceABEC7070superblockE′1030104CS553 Lecture Profile-Guided Optimizations 14Trace Scheduling (cont) 2. Superblock enlargement– Enlarge superblocks that are too small– Code expansion can hurt i-cache performance Three techniques for enlargement– Branch target expansion– If the last branch in a superblock is likely to jump to the start ofanother superblock, append the contents of the target superblock tothe first superblock– Loop peeling– Loop unrolling– These last two techniques apply to superblock loops, which aresuperblocks whose last blocks are likely to jump to their first blocks– Assume that each loop body has a single dominant pathCS553 Lecture Profile-Guided Optimizations 15Trace Scheduling (cont) 3. Optimizations– Perform list scheduling for each superblock– Memory-dependence profiles can be used to speculatively assume thatload/store pairs do not conflict– Insert repair code in case the assumption is incorrect– Software pipeliningCS553 Lecture Profile-Guided Optimizations 16Speculation based on memory-dependence profiles (example)b[i] = “old”a[i] = ...if (a[i]>0) then b[i]=“new”;else stmt X stmt Yendifc[i] = a[j]trace:b[i] = “old”c[i] = a[j]a[i] = ...b[i]=“new”;if (i==j) then goto deprepairif (a[i]<=0) then goto repaircontinue:...deprepair:c[i] = a[i]if (a[i]<=0) then goto repairgoto continuerepair:restore old b[i]stmt Xstmt Ygoto continueCS553 Lecture Profile-Guided


View Full Document

CSU CS 553 - Profile-Guided Optimization

Download Profile-Guided Optimization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Profile-Guided Optimization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Profile-Guided Optimization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?