Program OptimizationGoals of Today’s ClassImproving Program PerformanceWays to Optimize PerformanceHelping the Compiler Do Its JobOptimizing CompilersLimitations of Optimizing CompilersAvoiding Repeated ComputationWorrying About Side EffectsAnother Example on Side EffectsMemory AliasingSlide 12Another Aliasing ExampleSummary: Helping the CompilerExploiting the HardwareUnderlying HardwareAddition Faster Than MultiplicationBit Operations Faster Than ArithmeticCaching: Matrix MultiplicationMatrix Multiply: Cache EffectsSlide 21Slide 22Parallelism: Loop UnrollingParallelism: After Loop UnrollingProgram ExecutionAvoiding Function CallsWriting Your Own Malloc and FreeConclusionCourse Wrap UpThe Rest of the SemesterGoals of COS 217Relationship to Other CoursesLessons About Computer ScienceLessons ContinuedHave a Great Summer!!!1Program OptimizationProfessor Jennifer Rexfordhttp://www.cs.princeton.edu/~jrex2Goals of Today’s Class•Improving program performanceWhen and what to optimizeBetter algorithms & data structures vs. tuning the code•Exploiting an understanding of underlying systemCompiler capabilitiesHardware architectureProgram execution•Why?To be effective, and efficient, at making programs faster–Avoid optimizing the fast parts of the code–Help the compiler do its job betterTo review material from the second half of the course3Improving Program Performance•Most programs are already “fast enough”No need to optimize performance at allSave your time, and keep the program simple/readable•Most parts of a program are already “fast enough”Usually only a small part makes the program run slowlyOptimize only this portion of the program, as needed•Steps to improve execution (time) efficiencyDo timing studies (e.g., gprof)Identify hot spotsOptimize that part of the programRepeat as needed4Ways to Optimize Performance•Better data structures and algorithmsImproves the “asymptotic complexity”–Better scaling of computation/storage as input grows–E.g., going from O(n2) sorting algorithm to O(n log n)Clearly important if large inputs are expectedRequires understanding data structures and algorithms•Better source code the compiler can optimizeImproves the “constant factors”–Faster computation during each iteration of a loop–E.g., going from 1000n to 10n running timeClearly important if a portion of code is running slowlyRequires understanding hardware, compiler, execution5Helping the Compiler Do Its Job6Optimizing Compilers•Provide efficient mapping of program to machineRegister allocationCode selection and orderingEliminating minor inefficiencies•Don’t (usually) improve asymptotic efficiencyUp to the programmer to select best overall algorithm•Have difficulty overcoming “optimization blockers”Potential function side-effectsPotential memory aliasing7Limitations of Optimizing Compilers•Fundamental constraintCompiler must not change program behaviorEver, even under rare pathological inputs•Behavior that may be obvious to the programmer can be obfuscated by languages and coding stylesData ranges more limited than variable types suggestArray elements remain unchanged by function calls•Most analysis is performed only within functionsWhole-program analysis is too expensive in most cases•Most analysis is based only on static informationCompiler has difficulty anticipating run-time inputs8Avoiding Repeated Computation•A good compiler recognizes simple optimizationsAvoiding redundant computations in simple loopsStill, programmer may still want to make it explicit•ExampleRepetition of computation: n * ifor (i = 0; i < n; i++) for (j = 0; j < n; j++) a[n*i + j] = b[j];for (i = 0; i < n; i++) { int ni = n * i; for (j = 0; j < n; j++) a[ni + j] = b[j];}9Worrying About Side Effects•Compiler cannot always avoid repeated computationMay not know if the code has a “side effect”… that makes the transformation change the code’s behavior•Is this transformation okay?•Not necessarily, ifint func1(int x) { return f(x) + f(x) + f(x) + f(x);}int func1(int x) { return 4 * f(x);}int counter = 0;int f(int x) { return counter++;}And this function may be defined in another file known only at link time!10Another Example on Side Effects•Is this optimization okay?•Short answer: it dependsCompiler often cannot tellMost compilers do not try to identify side effects•Programmer knows bestAnd can decide whether the optimization is safefor (i = 0; i < strlen(s); i++) { /* Do something with s[i] */}length = strlen(s);for (i = 0; i < length; i++) { /* Do something with s[i] */}11Memory Aliasing•Is this optimization okay?•Not necessarily, what if xp and yp are equal?First version: result is 4 times *xpSecond version: result is 3 times *xpvoid twiddle(int *xp, int *yp) { *xp += *yp; *xp += *yp;}void twiddle(int *xp, int *yp) { *xp += 2 * *yp;}12Memory Aliasing•Memory aliasingSingle data location accessed through multiple namesE.g., two pointers that point to the same memory location•Modifying the data using one nameImplicitly modifies the values seen through other names•Blocks optimization by the compilerThe compiler cannot tell when aliasing may occur… and so must forgo optimizing the code•Programmer often does know And can optimize the code accordinglyxp, yp13Another Aliasing Example•Is this optimization okay?•Not necessarilyIf y and x point to the same location in memory…… the correct output is “x = 10\n”int *x, *y;…*x = 5;*y = 10;printf(“x=%d\n”, *x);printf(“x=5\n”);14Summary: Helping the Compiler•Compiler can perform many optimizationsRegister allocationCode selection and orderingEliminating minor inefficiencies•But often the compiler needs your helpKnowing if code is free of side effectsKnowing if memory aliasing will not happen•Modifying the code can lead to better performanceProfile the code to identify the “hot spots”Look at the assembly language the compiler producesRewrite the code to get the compiler to do the right thing15Exploiting the Hardware16Underlying Hardware•Implements a collection of instructionsInstruction set varies from one architecture to anotherSome instructions may be faster than others•Registers and caches are faster than main memoryNumber of registers and sizes of
View Full Document