Unformatted text preview:

COMP 206: Computer Architecture and ImplementationOutlineExample 1 (see HP3 pp. 42-45 for more examples)Example 1 (Soln. using Amdahl’s Law)Example 2Example 2 (Solution)Example 3Example 3 (Solution)Example 4Example 4 (Solution)Performance of (Blocking) CachesExampleMeansWeighted MeansRelations among MeansSummarizing Computer PerformanceArithmetic Mean for TimesHarmonic Mean for RatesAvoid the Geometric MeanPrograms to Evaluate PerformanceSPEC: Std Perf Evaluation CorpSPEC95 DetailsTrends in Integer PerformanceTrends in Floating Point PerformanceSPEC95 Ratings of ProcessorsSPEC95 vs SPEC CPU2000SPEC CPU2000 ExamplePerformance EvaluationCost of Integrated CircuitsExplanationsReal World ExamplesMoore’s LawMoore’s Law in Action at IntelMoore’s Law At Risk?Where Do The Transistors Go?Chip PhotographsEmbedded ProcessorsPower-Performance Tradeoff (Embedded)1COMP 206:COMP 206:Computer Architecture and Computer Architecture and ImplementationImplementationMontek SinghMontek SinghWed., Sep 7, 2005Wed., Sep 7, 2005Lecture 3Lecture 32OutlineOutlineQuantitative Principles of Computer DesignQuantitative Principles of Computer DesignAmdahl’s law (make the common case fast)Amdahl’s law (make the common case fast)Performance MetricsPerformance MetricsMIPS, FLOPS, and all that…MIPS, FLOPS, and all that…ExamplesExamples3Example 1 Example 1 (see HP3 pp. 42-45 for more (see HP3 pp. 42-45 for more examples)examples)Which change is more effective on a certain machine: speeding up 10-fold the floating point square root operation only, which takes up 20% of execution time, or speeding up 2-fold all floating point operations, which take up 50% of total execution time? (Assume that the cost of accomplishing either change is the same, and thetwo changes are mutually exclusive.)Which change is more effective on a certain machine: speeding up 10-fold the floating point square root operation only, which takes up 20% of execution time, or speeding up 2-fold all floating point operations, which take up 50% of total execution time? (Assume that the cost of accomplishing either change is the same, and thetwo changes are mutually exclusive.)Fsqrt = fraction of FP sqrt resultsRsqrt = rate of producing FP sqrt resultsFnon-sqrt = fraction of non-sqrt resultsRnon-sqrt = rate of producing non-sqrt resultsFfp = fraction of FP resultsRfp = rate of producing FP resultsFnon-fp = fraction of non-FP resultsRnon-fp = rate of producing non-FP resultsRbefore = average rate of producing results before enhancementRafter = average rate of producing results after enhancementRFRFRFRFfpfpfp-nonfp-nonsqrtsqrtsqrt-nonsqrt-non44Example 1 (Soln. using Amdahl’s Example 1 (Soln. using Amdahl’s Law)Law)22.11.45511.411.4141.01151411RRRFR10FRRFRFRbeforeaftersqrt-nonsqrt-nonsqrtsqrtaftersqrt-nonsqrt-nonsqrtsqrtbeforexxxxxxxxImprove FP sqrt only33.15.12215.115.115.0112111RRRFR2FRRFRFRbeforeafterfp-nonfp-nonfpfpafterfp-nonfp-nonfpfpbeforeyyyyyyyyImprove all FP ops00.10.20.30.40.50.60.70.80.9Sqrt (b) Sqrt (a) FP (b) FP (a)5Example 2Example 2 Machine A Machine BOperation Frequency CPI Frequency CPICompare 0.2 1Branch 0.2 2Cmp&Branch 0.2/0.8=0.25 2Others 0.6 1 0.6/0.8=0.75 1Machine A Machine BClockrate 1.25 1Instruction count 1 0.8Which CPU performs better?Which CPU performs better?Why?6Example 2 (Solution)Example 2 (Solution)04.12.125.18.025.12.125.15.075.028.02.018.06.02.112.022.016.0ICClockrateICClockrate1.25ICCPIClockrateICCPIClockratePerfPerfCPICPIABABBBBAAABABAIf clock cycle time of A was only 1.1x clock cycle time of B,then CPU B would be about 9% higher performance.7Example 3Example 3A LOAD/STORE machine has the characteristics shown below. We also observe that 25% of the ALU operations directly use a loaded value that is not used again. Thus we hope to improve things by adding new ALU instructions that have one source operand in memory. The CPI of the new instructions is 2. The only unpleasant consequence of this change is that the CPI of branch instructions will increase from 2 to 3. Overall, will CPU performance increase?A LOAD/STORE machine has the characteristics shown below. We also observe that 25% of the ALU operations directly use a loaded value that is not used again. Thus we hope to improve things by adding new ALU instructions that have one source operand in memory. The CPI of the new instructions is 2. The only unpleasant consequence of this change is that the CPI of branch instructions will increase from 2 to 3. Overall, will CPU performance increase?Instruction type Frequency CPIALU ops 0.43 1Loads 0.21 2Stores 0.12 2Branches 0.24 28Example 3 (Solution)Example 3 (Solution)Instruction type Frequency CPIALU ops 0.43 1Loads 0.21 2Stores 0.12 2Branches 0.24 2TIC 57.1T1.57 IC timecycleClock CPI IC timeCPU1.5720.24)0.12(0.2110.43 CPIBefore changeInstruction type Frequency CPIALU ops(0.43-x)/(1-x) 1Loads(0.21-x)/(1-x) 2Stores0.12/(1-x ) 2Branches0.24/(1-x) 3Reg-mem opsx/(1-x)2TIC 1.703 T908.1 IC)-(1 timecycleClock CPI IC timeCPU908.10.89251.7025-130.242)0.12-(0.211)-(0.43 CPI1075.040.43 xxxxxxAfter changeSince CPU time increases, change will not improve performance.9Example 4Example 4A load-store machine has the characteristics shown below. An optimizingcompiler for the machine discards 50% of the ALU operations, although itcannot reduce loads, stores, or branches. Assuming a 500 MHz (2 ns)clock, what is the MIPS rating for optimized code versus unoptimized code?Does the ranking of MIPS agree with the ranking of execution time?A load-store machine has the characteristics shown below. An optimizingcompiler for the machine discards 50% of the ALU operations, although itcannot reduce loads, stores, or branches. Assuming a 500 MHz (2 ns)clock, what is the MIPS rating for optimized code versus unoptimized code?Does the ranking of MIPS agree with the ranking of execution time?Instruction type Frequency CPIALU ops 43% 1Loads 21% 2Stores 12% 2Branches 24% 210Example 4 (Solution)Example 4 (Solution)Instruction type Frequency CPIALU ops 43% 1Loads 21% 2Stores 12% 2Branches 24% 25.318101.57MHz 500 MIPSIC 1014.31021.57 IC timecycleClock CPI IC timeCPU1.5720.24)0.12(0.2110.43


View Full Document

UNC-Chapel Hill COMP 206 - LECTURE NOTES

Download LECTURE NOTES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LECTURE NOTES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LECTURE NOTES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?