IntroductionCompiler OrganizationDependence analysisData dependences examplesLoop dependence analysisTransformationsDataflow-based loop transformationsLoop reorderingLoop restructuringLoop replacementMemory access transformationsMemory access transformationsPartial evaluationRedundancy eliminationTo be continuedCompiler Transformationsfor High-Performance Computing(1)Presented byJason Pazis and Yi ZhangMarch 23, 2010What’s this survey about?IComprehensive overview of high-level compilertransformations/optimizationsILanguages: imperative, e.g. C, FortranIArchitecturesISequential: common and general-purposeIParallel: superscalar, vector, SIMD, shared-memory MP,distributed-memory MP, etcWhat do compilers do?IOn a high levelITranslation: source code → machine codeIOptimization: various transformations to reduce running time,code size, etcISpecificallyILexical analysisIParsingISemantic AnalysisIOptimizationICode generationClear separation of high-level programming languages and machinearchitectureWhat do compilers do?IOn a high levelITranslation: source code → machine codeIOptimization: various transformations to reduce running time,code size, etcISpecificallyILexical analysisIParsingISemantic AnalysisIOptimizationICode generationClear separation of high-level programming languages and machinearchitectureWhat do compilers do?IOn a high levelITranslation: source code → machine codeIOptimization: various transformations to reduce running time,code size, etcISpecificallyILexical analysisIParsingISemantic AnalysisIOptimizationICode generationClear separation of high-level programming languages and machinearchitectureOptimization trilogyDecide → Verify → TransformDecideIDifficult and poorly understoodISearch space is hugeIDecision making is complicated and expensive: some areNP-complete or even undecidableIMostly a collection of piecemeal heuristicsIWith some ordering heuristicsIWith some progress in systematic application of families oftransformationsIConflicts not uncommon, leading toIWorse performance: less code → less efficient use of cacheIIncorrect program: e.g., Ubuntu 8.04’s patch made thefollowing code always output 1int foo (void) {signed char x = 1;unsigned char y=-1;return x > y;}DecideIDifficult and poorly understoodISearch space is hugeIDecision making is complicated and expensive: some areNP-complete or even undecidableIMostly a collection of piecemeal heuristicsIWith some ordering heuristicsIWith some progress in systematic application of families oftransformationsIConflicts not uncommon, leading toIWorse performance: less code → less efficient use of cacheIIncorrect program: e.g., Ubuntu 8.04’s patch made thefollowing code always output 1int foo (void) {signed char x = 1;unsigned char y=-1;return x > y;}DecideIDifficult and poorly understoodISearch space is hugeIDecision making is complicated and expensive: some areNP-complete or even undecidableIMostly a collection of piecemeal heuristicsIWith some ordering heuristicsIWith some progress in systematic application of families oftransformationsIConflicts not uncommon, leading toIWorse performance: less code → less efficient use of cacheIIncorrect program: e.g., Ubuntu 8.04’s patch made thefollowing code always output 1int foo (void) {signed char x = 1;unsigned char y=-1;return x > y;}Scope of decisionIStatementIBasic block (straight-line code)IInnermost loopIPerfect loop nestIGeneral loop nestIProcedure (aka global optimization)IInterproceduralVerifyWhat is a legal transformation? (Given original program A andtransformed program B)IB and A perform exactly the same operations in the sameorderIB and A produce exactly the same output for all identicalexecutionsIWith same input dataIWith same results for nondeterministic operations, e.g, rand()VerifyWhat is a legal transformation? (Given original program A andtransformed program B)IB and A perform exactly the same operations in the sameorder — too strictIB and A produce exactly the same output for all identicalexecutionsIWith same input dataIWith same results for nondeterministic operations, e.g, rand()VerifyWhat is a legal transformation? (Given original program A andtransformed program B)IB and A perform exactly the same operations in the sameorder — too strictIB and A produce exactly the same output for all identicalexecutionsIWith same input dataIWith same results for nondeterministic operations, e.g, rand()VerifyWhat is a legal transformation? (Given original program A andtransformed program B)IB and A perform exactly the same operations in the sameorder — too strictIB and A produce exactly the same output for all identicalexecutions — still too strictIWith same input dataIWith same results for nondeterministic operations, e.g, rand()Let’s verify(a) Originaldo i=1,na[i] = b[k]+a[i]+100000.0end doreturn(b) TransformedC = b[k]+100000.0do i=n,1,-1a[i] = a[i]+Cend doreturnProblems:IEvaluating C first may cause overflowIReordered additions of float-point numbers may causedifferent resultsIAlgebraic commutative operations can be computationallynon-commutative for float-point numbers (semicommutative)IIf k is out of range of array b, memory fault can happen at adifferent placeIa and b may be completely or partially aliased to one another,causing updated b[k] to be used in (a) but not in (b)Let’s verify(a) Originaldo i=1,na[i] = b[k]+a[i]+100000.0end doreturn(b) TransformedC = b[k]+100000.0do i=n,1,-1a[i] = a[i]+Cend doreturnProblems:IEvaluating C first may cause overflowIReordered additions of float-point numbers may causedifferent resultsIAlgebraic commutative operations can be computationallynon-commutative for float-point numbers (semicommutative)IIf k is out of range of array b, memory fault can happen at adifferent placeIa and b may be completely or partially aliased to one another,causing updated b[k] to be used in (a) but not in (b)Let’s verify(a) Originaldo i=1,na[i] = b[k]+a[i]+100000.0end doreturn(b) TransformedC = b[k]+100000.0do i=n,1,-1a[i] = a[i]+Cend doreturnProblems:IEvaluating C first may cause overflowIReordered additions of float-point numbers may causedifferent resultsIAlgebraic commutative operations can be computationallynon-commutative for float-point numbers (semicommutative)IIf k is out of range of array b, memory fault can happen at adifferent placeIa and b may be completely or partially aliased to one another,causing updated b[k] to be used in (a) but not in (b)Let’s verify(a) Originaldo i=1,na[i] =
View Full Document