Code Optimization IFeb. 12, 2008Code Optimization IFeb. 12, 2008TopicsTopics Machine-Independent Optimizationsz Basic optimizationsz Optimization blockersclass09.ppt15-213“The course that gives CMU its Zip!”–2–15-213, S’08Harsh RealityHarsh RealityThereThere’’s more to performance than asymptotic complexitys more to performance than asymptotic complexityConstant factors matter too!Constant factors matter too! Easily see 10:1 performance range depending on how code is written Must optimize at multiple levels: z algorithm, data representations, procedures, and loopsMust understand system to optimize performanceMust understand system to optimize performance How programs are compiled and executed How to measure program performance and identify bottlenecks How to improve performance without destroying code modularity and generality–3–15-213, S’08Optimizing CompilersOptimizing CompilersProvide efficient mapping of program to machineProvide efficient mapping of program to machine register allocation code selection and ordering (scheduling) dead code elimination eliminating minor inefficienciesDonDon’’t (usually) improve asymptotic efficiencyt (usually) improve asymptotic efficiency up to programmer to select best overall algorithm big-O savings are (often) more important than constant factorsz but constant factors also matterHave difficulty overcoming Have difficulty overcoming ““optimization blockersoptimization blockers”” potential memory aliasing potential procedure side-effects–4–15-213, S’08Limitations of Optimizing CompilersLimitations of Optimizing CompilersOperate under fundamental constraintOperate under fundamental constraint Must not cause any change in program behavior under any possible condition Often prevents it from making optimizations when would only affect behavior under pathological conditions.Behavior that may be obvious to the programmer can be Behavior that may be obvious to the programmer can be obfuscated by languages and coding stylesobfuscated by languages and coding styles e.g., Data ranges may be more limited than variable types suggestMost analysis is performed only within proceduresMost analysis is performed only within procedures Whole-program analysis is too expensive in most casesMost analysis is based only on Most analysis is based only on staticstaticinformationinformation Compiler has difficulty anticipating run-time inputsWhen in doubt, the compiler must be conservativeWhen in doubt, the compiler must be conservative–5–15-213, S’08Machine-Independent OptimizationsMachine-Independent OptimizationsOptimizations that you or the compiler should do Optimizations that you or the compiler should do regardless of processor / compilerregardless of processor / compilerCode MotionCode Motion Reduce frequency with which computation performedz If it will always produce same resultz Especially moving code out of looplong j;intni= n*i;for (j = 0; j < n; j++)a[ni+j] = b[j];void set_row(double *a, double *b,long i, long n){long j;for (j = 0; j < n; j++)a[n*i+j] = b[j];}–6–15-213, S’08Compiler-Generated Code MotionCompiler-Generated Code Motionset_row:xorl %r8d, %r8d # j = 0cmpq %rcx, %r8 # j:njge .L7 # if >= goto donemovq %rcx, %rax # nimulq %rdx, %rax # n*i outside of inner loopleaq (%rdi,%rax,8), %rdx # rowp = A + n*i*8.L5: # loop:movq (%rsi,%r8,8), %rax # t = b[j]incq %r8 # j++movq %rax, (%rdx) # *rowp = taddq $8, %rdx # rowp++cmpq %rcx, %r8 # j:njl .L5 # if < goot loop.L7: # done:rep ; ret # returnlong j;long ni = n*i;double *rowp = a+ni;for (j = 0; j < n; j++)*rowp++ = b[j];void set_row(double *a, double *b,long i, long n){long j;for (j = 0; j < n; j++)a[n*i+j] = b[j];}Where are the FP operations?–7–15-213, S’08Reduction in StrengthReduction in Strength Replace costly operation with simpler one Shift, add instead of multiply or divide16*x --> x << 4z Utility machine dependentz Depends on cost of multiply or divide instructionz On Pentium IV, integer multiply requires 10 CPU cycles Recognize sequence of productsfor (i = 0; i < n; i++)for (j = 0; j < n; j++)a[n*i + j] = b[j];int ni = 0;for (i = 0; i < n; i++) {for (j = 0; j < n; j++)a[ni + j] = b[j];ni += n;}–8–15-213, S’08Share Common SubexpressionsShare Common Subexpressions Reuse portions of expressions Compilers often not very sophisticated in exploiting arithmetic properties/* Sum neighbors of i,j */up = val[(i-1)*n + j ];down = val[(i+1)*n + j ];left = val[i*n + j-1];right = val[i*n + j+1];sum = up + down + left + right;int inj = i*n + j;up = val[inj - n];down = val[inj + n];left = val[inj - 1];right = val[inj + 1];sum = up + down + left + right;3 multiplications: i*n, (i–1)*n, (i+1)*n 1 multiplication: i*nleaq 1(%rsi), %rax # i+1leaq -1(%rsi), %r8 # i-1imulq %rcx, %rsi # i*nimulq %rcx, %rax # (i+1)*nimulq %rcx, %r8 # (i-1)*naddq %rdx, %rsi # i*n+jaddq %rdx, %rax # (i+1)*n+jaddq %rdx, %r8 # (i-1)*n+jimulq %rcx, %rsi # i*naddq %rdx, %rsi # i*n+jmovq %rsi, %rax # i*n+jsubq %rcx, %rax # i*n+j-nleaq (%rsi,%rcx), %rcx # i*n+j+n–9–15-213, S’08void lower(char *s){int i;for (i = 0; i < strlen(s); i++)if (s[i] >= 'A' && s[i] <= 'Z')s[i] -= ('A' - 'a');}Optimization Blocker #1: Procedure CallsOptimization Blocker #1: Procedure CallsProcedure to Convert String to Lower CaseProcedure to Convert String to Lower Case Extracted from 213 lab submissions, Fall, 1998–10–15-213, S’08Lower Case Conversion PerformanceLower Case Conversion Performance Time quadruples when double string length Quadratic performance0.00010.0010.010.111010010002565121k2k4k8k16k32k64k128k256kCPU SecondsString Length–11–15-213, S’08Convert Loop To Goto FormConvert Loop To Goto Form strlen executed every iterationvoid lower(char *s){int i = 0;if (i >= strlen(s))goto done;loop:if (s[i] >= 'A' && s[i] <= 'Z')s[i] -= ('A' - 'a');i++;if (i < strlen(s))goto loop;done:}–12–15-213, S’08Calling StrlenCalling StrlenStrlenStrlenperformanceperformance Only way to determine length of string is to scan its entire length, looking for null character.Overall performance, string of length NOverall performance, string of length N N calls to strlen Require times N, N-1, N-2, …, 1 Overall
View Full Document