DOC PREVIEW
CMU CS 15745 - Lecture

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

15-745: Lecture 6 2/1/2007115-745 © Tim Callahan 115-745Profiling,Traces, Superblocks, and HyperblocksReading: 17.5, papersCopyright © Tim Callahan 2005-7(except EPP and SCBP slides...)15-745 © Tim Callahan 2Notes from Previous Lectures• Induction variables:Pegasus already has full analysis, so that if a node in a loop produces a value that changes by a loop-invariant amount each iteration, it is tagged with that information (starting value, stepping value). Besides IVSR, this is useful for optimizing memory accesses (recognizing streams), etc.15-745 © Tim Callahan 3CCP Example• We have seen how knowledge that a path could neverexecute could be exploited for more optimization: if (s<10)x = 5;x = w-z;y = x*x;s = {1}15-745 © Tim Callahan 4CCP Exampleif (s<10)x = 5;x = w-z;y = x*x;y = 25;s = {1}• We saw how knowledge that a path could never execute could be exploited for more optimization:15-745: Lecture 6 2/1/2007215-745 © Tim Callahan 5CCP Example• What about this example? The path through [*] rarely executes...can we just ignore it? It’s tempting...if (today not leap day)x = 5;x = w-z; y = x*x;*1146015-745 © Tim Callahan 6CCP Examplex = 5;x = w-z;y = x*x;y = 25;• What about this example? The path through [*] rarely executes...can we just ignore it?• Result: code that is always faster and usually correct• NOT ACCEPTABLEif (today not leap day)*1146015-745 © Tim Callahan 8CCP Example• So, how can we exploit “probably” information in const prop? The classic version exploits only “definitely”information...if (today not leap day)x = 5;x = w-z;y = x*x;11460cmon, it's obvious onceyou see it....15-745 © Tim Callahan 9CCP Example• Answer – duplicate a basic block.if (today not leap day)x = 5;x = w-z;y = x*x; y = x*x;1146015-745: Lecture 6 2/1/2007315-745 © Tim Callahan 10CCP Example• Cost: code expansion (probably)if (today not leap day)x = 5;x = w-z;y = x*x;y = 25;y = x*x;1146015-745 © Tim Callahan 11General Rule• “Do the common case fast, the other cases correctly”• In other words, it makes sense to optimize the common paths even if it makes the uncommon paths slower.The weighted (dynamic) performance will improve.• How does the compiler know what the “common case” is?15-745 © Tim Callahan 12PROFILING• Ok, a “leap day” special case doesn't come up often; • but skewed branches are common• ...how can this be communicated to the compiler?• Profiling information: A summary of program execution- run it with a sample input data set and record 'stuff'sampling blockcountsedgecountspathcountswholeprogramtraceDETAILlessmore15-745 © Tim Callahan 13Profiling: How•Instrumentthe executable, then run with sample datacount[1]++;count[2]++;count[3]++;count[1]++;count[2]++;count[3]++;Block countsEdge counts15-745: Lecture 6 2/1/2007415-745 © Tim Callahan 14Profiling• Another reason: test coverage– Are there latent bugs in your code that have never been discovered because they've never been executed? How would you know? Make sure every block is executed...or possibly every path– But, this is a course in optimizing compilers, so back to optimization...15-745 © Tim Callahan 16Other classic optimizations, w/ profiling• We’ve seen CCP...what's this?x = w*w;if (p) w = z*y;y =w*w;1146015-745 © Tim Callahan 17Other classic optimizations, w/ profiling• We’ve seen CCP...what's this?x = w*w;if (p)w = z*y;y =w*w;11460y =w*w;15-745 © Tim Callahan 18Other classic optimizations, w/ profiling• We have seen Const Prop...what's this?x = w*w;if (p)w = z*y;y =w*w;11460y =x;Global CSE!using available expressionsdeja vu all over again...15-745: Lecture 6 2/1/2007515-745 © Tim Callahan 19Other classic optimizations, w/ profiling• A Trend! It's not that each optimization is modified to take profile information into account...•Instead, the CFG is restructured so that infrequent paths don't interfere with optimizations on the frequent path. Then just apply existing optimizations!ACEGFDBACEGFDBC'E'G'15-745 © Tim Callahan 20Generalizing...• Since the common path is now a linear chain, it can be combined into one large block...with tail duplication.ACEGFDBACEGFDBC'E'G'15-745 © Tim Callahan 21Superblock• Since the common path is now a linear chain, it can be combined into one large block...with tail duplication.• This is a superblock – single entry at top, multiple exits.ACEGFDBACEGFDBC'E'G'15-745 © Tim Callahan 22super-block:• constant propagation • copy propagation• constant combining• dead code removalsuper-block loop:• loop invariant code removal• loop induction variable elimination• global variable migrationSuperblock Optimizations• common subexpression elimination• redundant store elimination• redundant load eliminationChang, Mahlke,and Hwu15-745: Lecture 6 2/1/2007615-745 © Tim Callahan 24Superblock Loop Invariant Hoistingif (i & 255)step = i;x = (step-1)<<2;...<use x>. . .i = i+1;if (i<N)ny15-745 © Tim Callahan 25Superblock Loop Invariant Hoistingif (i & 255)step = i;x = (step-1)<<2;...<use x>. . .i = i+1;if (i<N)nyx = (step-1)<<2;...<use x>. . .i = i+1;if (i<N)15-745 © Tim Callahan 26Superblock Loop Invariant Hoistingif (i & 255)step = i;x = (step-1)<<2;...<use x>. . .i = i+1;if (i<N)nyx = (step-1)<<2;...<use x>. . .i = i+1;if (i<N)x = (step-1)<<215-745 © Tim Callahan 27Changing GearsOther uses for profiling15-745: Lecture 6 2/1/2007715-745 © Tim Callahan 28Instruction Level Parallelism• When the processor can perform multiple operations in parallel, AND the application has operations that can be executed in parallel, you can get a big performance boost:exec time = instructions * cyclesPerInstruction * cycle_period• But ILP within basic blocks is limited (2-3) -Foster, Tjaden, Wall• To achieve more ILP, need to schedule across branches:– superscalar: branch prediction and dynamic scheduling– VLIW: compiler-controlled approaches• VLIW = Very Long Instruction Word• “Expose, Enhance, and Exploit ILP” -Bob Rau• We will not address the details of scheduling here…• Instead, discuss how to group computation intelligently to hand offto the scheduler15-745 © Tim Callahan 29Early VLIW• ELI project, Bulldog compiler (Josh Fisher, John Ellis)• 1985 ACM dissertation award!• Clustered VLIW, heterogenous function units– Sound familiar?– But back then, this was big iron– Targeting scientific code, not control-intensive code– And more clusters…8ish•


View Full Document

CMU CS 15745 - Lecture

Documents in this Course
Lecture

Lecture

19 pages

Lecture

Lecture

8 pages

Lecture

Lecture

5 pages

Lecture

Lecture

6 pages

lecture

lecture

17 pages

Lecture 3

Lecture 3

12 pages

Lecture

Lecture

17 pages

Lecture

Lecture

18 pages

lecture

lecture

14 pages

lecture

lecture

8 pages

lecture

lecture

5 pages

Lecture

Lecture

19 pages

lecture

lecture

10 pages

Lecture

Lecture

20 pages

Lecture

Lecture

8 pages

Lecture

Lecture

7 pages

lecture

lecture

59 pages

Lecture

Lecture

10 pages

Task 2

Task 2

2 pages

Handout

Handout

18 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?