DOC PREVIEW
Harvey Mudd CS 105 - Code Optimization and Performance

This preview shows page 1-2-17-18-19-36-37 out of 37 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Code Optimization and PerformanceTopicsSpeed and optimizationGreat Reality #4Optimizing CompilersLimitations of Optimizing CompilersNew Topic: Machine-Independent OptimizationsCompiler-Generated Code MotionReduction in StrengthMake Use of RegistersMachine-Independent Opts. (Cont.)Example: Vector ADTOptimization ExampleTime ScalesCycles Per ElementSlide 16Move vec_length Call Out of LoopCode Motion Example #2Lowercase-Conversion PerformanceSlide 20Improving PerformanceSlide 22Optimization Blocker: Procedure CallsSlide 24Eliminate Unneeded Memory RefsDetecting Unneeded Memory Refs.Optimization Blocker: Memory AliasingMachine-Independent Optimization SummaryPointer CodePointer vs. Array Code Inner LoopsImportant ToolsNew Topic: Code Profiling ExampleCode ProfilingProfiling ResultsCode OptimizationsFurther OptimizationsProfiling ObservationsCode Optimization and Performance Code Optimization and Performance Chapter 5Chapter 5perf01.pptCS 105“Tour of the Black Holes of Computing”– 2 –CS 105TopicsTopicsMachine-independentMachine-independent optimizationsoptimizationsCode motionReduction in strengthCommon subexpression sharingTuning: Tuning: Identifying performance bottlenecksMachine-dependent optimizationsPointer codeLoop unrollingEnabling instruction-level parallelismUnderstanding processor optimizationTranslation of instructions into operationsOut-of-order executionBranchesCaches and BlockingAdvice– 3 –CS 105Speed and optimizationSpeed and optimizationProgrammerProgrammerChoice of algorithmIntelligent codingCompilerCompilerChoice of instructionsMoving codeReordering codeStrength reductionMust be faithful to original programProcessorProcessorPipeliningMultiple execution unitsMemory accessesBranchesCaches Rest of systemRest of systemUncontrollable– 4 –CS 105Great Reality #4Great Reality #4There’s more to performance thanThere’s more to performance thanasymptotic complexityasymptotic complexityConstant factors matter too!Constant factors matter too!Easily see 10:1 performance range depending on how code is writtenMust optimize at multiple levels: Algorithm, data representations, procedures, and loopsMust understand system to optimize performanceMust understand system to optimize performanceHow programs are compiled and executedHow to measure program performance and identify bottlenecksHow to improve performance without destroying code modularity, generality, readability– 5 –CS 105Optimizing CompilersOptimizing CompilersProvide efficient mapping of program to machineProvide efficient mapping of program to machineRegister allocationCode selection and orderingEliminating minor inefficienciesDon’t (usually) improve asymptotic efficiencyDon’t (usually) improve asymptotic efficiencyUp to programmer to select best overall algorithmBig-O savings are (often) more important than constant factorsBut constant factors also matterHave difficulty overcoming “optimization blockers”Have difficulty overcoming “optimization blockers”Potential memory aliasingPotential procedure side-effects– 6 –CS 105Limitationsof Optimizing CompilersLimitationsof Optimizing CompilersOperate Under Fundamental ConstraintOperate Under Fundamental ConstraintMust not cause any change in program behavior under any possible conditionOften prevents making optimizations that would only affect behavior under pathological conditions.Behavior that may be obvious to the programmer can be Behavior that may be obvious to the programmer can be obfuscated by languages and coding stylesobfuscated by languages and coding stylesE.g., data ranges may be more limited than variable types suggestMost analysis is performed only within proceduresMost analysis is performed only within proceduresWhole-program analysis is too expensive in most casesMost analysis is based only on Most analysis is based only on staticstatic information informationCompiler has difficulty anticipating run-time inputsWhen in doubt, the compiler must be conservativeWhen in doubt, the compiler must be conservative– 7 –CS 105New Topic:Machine-Independent OptimizationsNew Topic:Machine-Independent OptimizationsOptimizations you should do regardless of processor / compilerCode MotionCode MotionReduce frequency with which computation performedIf it will always produce same resultEspecially moving code out of loopfor (i = 0; i < n; i++) for (j = 0; j < n; j++) a[n*i + j] = b[j];for (i = 0; i < n; i++) { int ni = n*i; for (j = 0; j < n; j++) a[ni + j] = b[j];}– 8 –CS 105Compiler-Generated Code MotionCompiler-Generated Code MotionMost compilers do a good job with array code + simple loop structuresCode Generated by GCCCode Generated by GCCfor (i = 0; i < n; i++) for (j = 0; j < n; j++) a[n*i + j] = b[j]; imull %ebx,%eax # i*n movl 8(%ebp),%edi # a leal (%edi,%eax,4),%edx # p = a+i*n (scaled by 4)# Inner Loop.L40: movl 12(%ebp),%edi # b movl (%edi,%ecx,4),%eax # b+j (scaled by 4) movl %eax,(%edx) # *p = b[j] addl $4,%edx # p++ (scaled by 4) incl %ecx # j++ cmpl %ebx,%ecx # j : n (reversed) jl .L40 # loop if j<nfor (i = 0; i < n; i++) { int ni = n*i; int *p = a+ni; for (j = 0; j < n; j++) *p++ = b[j];}– 9 –CS 105Reduction in StrengthReduction in StrengthReplace costly operation with simpler oneShift, add instead of multiply or divide16*x --> x << 4Utility is machine-dependentDepends on cost of multiply or divide instructionOn Pentium II or III, integer multiply only requires 4 CPU cyclesRecognize sequence of productsfor (i = 0; i < n; i++) for (j = 0; j < n; j++) a[n*i + j] = b[j];int ni = 0;for (i = 0; i < n; i++) { for (j = 0; j < n; j++) a[ni + j] = b[j]; ni += n;}– 10 –CS 105Make Use of RegistersMake Use of RegistersReading and writing registers much faster than reading/writing memoryLimitationLimitationCompiler not always able to determine whether variable can be held in registerPossibility of aliasingSee example later– 11 –CS 105Machine-Independent Opts. (Cont.)Machine-Independent Opts. (Cont.)Share Common SubexpressionsShare Common SubexpressionsReuse portions of expressionsCompilers often not very sophisticated in exploiting arithmetic properties/* Sum neighbors of i,j */up = val[(i-1)*n + j];down = val[(i+1)*n + j];left = val[i*n + j-1];right = val[i*n +


View Full Document

Harvey Mudd CS 105 - Code Optimization and Performance

Documents in this Course
Processes

Processes

25 pages

Processes

Processes

27 pages

Load more
Download Code Optimization and Performance
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Code Optimization and Performance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Code Optimization and Performance 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?