DOC PREVIEW
Duke CPS 296.1 - Compiler Transformations for High-Performance Computing

This preview shows page 1-2-3-4-29-30-31-32-33-60-61-62-63 out of 63 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

IntroductionCompiler OrganizationDependence analysisData dependences examplesLoop dependence analysisTransformationsDataflow-based loop transformationsLoop reorderingLoop restructuringLoop replacementMemory access transformationsMemory access transformationsPartial evaluationRedundancy eliminationTo be continuedCompiler Transformationsfor High-Performance Computing(1)Presented byJason Pazis and Yi ZhangMarch 23, 2010What’s this survey about?IComprehensive overview of high-level compilertransformations/optimizationsILanguages: imperative, e.g. C, FortranIArchitecturesISequential: common and general-purposeIParallel: superscalar, vector, SIMD, shared-memory MP,distributed-memory MP, etcWhat do compilers do?IOn a high levelITranslation: source code → machine codeIOptimization: various transformations to reduce running time,code size, etcISpecificallyILexical analysisIParsingISemantic AnalysisIOptimizationICode generationClear separation of high-level programming languages and machinearchitectureWhat do compilers do?IOn a high levelITranslation: source code → machine codeIOptimization: various transformations to reduce running time,code size, etcISpecificallyILexical analysisIParsingISemantic AnalysisIOptimizationICode generationClear separation of high-level programming languages and machinearchitectureWhat do compilers do?IOn a high levelITranslation: source code → machine codeIOptimization: various transformations to reduce running time,code size, etcISpecificallyILexical analysisIParsingISemantic AnalysisIOptimizationICode generationClear separation of high-level programming languages and machinearchitectureOptimization trilogyDecide → Verify → TransformDecideIDifficult and poorly understoodISearch space is hugeIDecision making is complicated and expensive: some areNP-complete or even undecidableIMostly a collection of piecemeal heuristicsIWith some ordering heuristicsIWith some progress in systematic application of families oftransformationsIConflicts not uncommon, leading toIWorse performance: less code → less efficient use of cacheIIncorrect program: e.g., Ubuntu 8.04’s patch made thefollowing code always output 1int foo (void) {signed char x = 1;unsigned char y=-1;return x > y;}DecideIDifficult and poorly understoodISearch space is hugeIDecision making is complicated and expensive: some areNP-complete or even undecidableIMostly a collection of piecemeal heuristicsIWith some ordering heuristicsIWith some progress in systematic application of families oftransformationsIConflicts not uncommon, leading toIWorse performance: less code → less efficient use of cacheIIncorrect program: e.g., Ubuntu 8.04’s patch made thefollowing code always output 1int foo (void) {signed char x = 1;unsigned char y=-1;return x > y;}DecideIDifficult and poorly understoodISearch space is hugeIDecision making is complicated and expensive: some areNP-complete or even undecidableIMostly a collection of piecemeal heuristicsIWith some ordering heuristicsIWith some progress in systematic application of families oftransformationsIConflicts not uncommon, leading toIWorse performance: less code → less efficient use of cacheIIncorrect program: e.g., Ubuntu 8.04’s patch made thefollowing code always output 1int foo (void) {signed char x = 1;unsigned char y=-1;return x > y;}Scope of decisionIStatementIBasic block (straight-line code)IInnermost loopIPerfect loop nestIGeneral loop nestIProcedure (aka global optimization)IInterproceduralVerifyWhat is a legal transformation? (Given original program A andtransformed program B)IB and A perform exactly the same operations in the sameorderIB and A produce exactly the same output for all identicalexecutionsIWith same input dataIWith same results for nondeterministic operations, e.g, rand()VerifyWhat is a legal transformation? (Given original program A andtransformed program B)IB and A perform exactly the same operations in the sameorder — too strictIB and A produce exactly the same output for all identicalexecutionsIWith same input dataIWith same results for nondeterministic operations, e.g, rand()VerifyWhat is a legal transformation? (Given original program A andtransformed program B)IB and A perform exactly the same operations in the sameorder — too strictIB and A produce exactly the same output for all identicalexecutionsIWith same input dataIWith same results for nondeterministic operations, e.g, rand()VerifyWhat is a legal transformation? (Given original program A andtransformed program B)IB and A perform exactly the same operations in the sameorder — too strictIB and A produce exactly the same output for all identicalexecutions — still too strictIWith same input dataIWith same results for nondeterministic operations, e.g, rand()Let’s verify(a) Originaldo i=1,na[i] = b[k]+a[i]+100000.0end doreturn(b) TransformedC = b[k]+100000.0do i=n,1,-1a[i] = a[i]+Cend doreturnProblems:IEvaluating C first may cause overflowIReordered additions of float-point numbers may causedifferent resultsIAlgebraic commutative operations can be computationallynon-commutative for float-point numbers (semicommutative)IIf k is out of range of array b, memory fault can happen at adifferent placeIa and b may be completely or partially aliased to one another,causing updated b[k] to be used in (a) but not in (b)Let’s verify(a) Originaldo i=1,na[i] = b[k]+a[i]+100000.0end doreturn(b) TransformedC = b[k]+100000.0do i=n,1,-1a[i] = a[i]+Cend doreturnProblems:IEvaluating C first may cause overflowIReordered additions of float-point numbers may causedifferent resultsIAlgebraic commutative operations can be computationallynon-commutative for float-point numbers (semicommutative)IIf k is out of range of array b, memory fault can happen at adifferent placeIa and b may be completely or partially aliased to one another,causing updated b[k] to be used in (a) but not in (b)Let’s verify(a) Originaldo i=1,na[i] = b[k]+a[i]+100000.0end doreturn(b) TransformedC = b[k]+100000.0do i=n,1,-1a[i] = a[i]+Cend doreturnProblems:IEvaluating C first may cause overflowIReordered additions of float-point numbers may causedifferent resultsIAlgebraic commutative operations can be computationallynon-commutative for float-point numbers (semicommutative)IIf k is out of range of array b, memory fault can happen at adifferent placeIa and b may be completely or partially aliased to one another,causing updated b[k] to be used in (a) but not in (b)Let’s verify(a) Originaldo i=1,na[i] =


View Full Document

Duke CPS 296.1 - Compiler Transformations for High-Performance Computing

Documents in this Course
Lecture

Lecture

18 pages

Lecture

Lecture

6 pages

Lecture

Lecture

13 pages

Lecture

Lecture

5 pages

Load more
Download Compiler Transformations for High-Performance Computing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Compiler Transformations for High-Performance Computing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Compiler Transformations for High-Performance Computing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?