DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 6 – Performance and Energy

This preview shows page 1-2-3-4-5-32-33-34-35-64-65-66-67-68 out of 68 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 48Slide 49Slide 50Slide 51Slide 52Slide 53Slide 54Slide 55Slide 56Slide 57Slide 58Slide 59Slide 60Slide 61Slide 62Slide 63Slide 64Slide 65Slide 66Slide 67Slide 68CS 152 L6: Performance UC Regents Fall 2006 © UCB2006-9-14John Lazzaro (www.cs.berkeley.edu/~lazzaro)CS 152 Computer Architecture and EngineeringLecture 6 – Performance and Energywww-inst.eecs.berkeley.edu/~cs152/TAs: Udam Saini and Jue SunUC Regents Fall 2006 © UCBCS 152 L6: PerformanceLast Time: Processor Timingx = g(a, b, c, d, e, f)If d going 0-to-1 switches x 0-to-1, delay is T1.If a going 0-to-1 switches x 0-to-1, delay is T2.Would you be surprised if T1 > T2? Why?T1T2T2 might be the critical (worst-case delay) path.UC Regents Fall 2006 © UCBCS 152 L6: PerformanceToday’s Lecture - PerformanceMeasurement: what, why, howThe performance equationHow energy limits performanceAmdahl’s lawCS 152 L6: Performance UC Regents Fall 2006 © UCBPerformance Measurement(as seen by the customer)UC Regents Fall 2006 © UCBCS 152 L6: PerformanceWho (sensibly) upgrades CPUs often?A professional who turns CPU cycles into money, and who is cycle-limited.Artist tool: animation, video special effects.UC Regents Fall 2006 © UCBCS 152 L6: PerformanceHow to decide to buy a new machine?Measure After Effects “execution time” on a representative render “workload” “Night flight”City map and cloudscomputed“on the fly” with fractalsCPU intensive Trivial I/O(still shot from the movie)UC Regents Fall 2006 © UCBCS 152 L6: Performance Interpreting Execution TimePerformance1Execution Time== 2.85 renders/hour1.5 GHz PB (Y) is N times faster than 1.25 GHz PB (X). N is ?N =Performance (Y)Execution Time (Y)Execution Time (X)Performance (X)== 1. 19PB 1.5 Ghz : 3. 4 renders/hour. PB 1.25 : 2.85 renders/hour.Might make the difference in meeting a deadline ...Execution Time: 1265 secondsPowerBookG41.25 GHzCS 152 L6: Performance UC Regents Fall 2006 © UCBExecution Time: Time for 1 job to complete 2 CPUs: Execution Time vs ThroughputThroughput: # of parallel jobs/hour completedG5 and Opteron may very well have throughput.Assume G5 MP executiontime faster because AE doesnot use both Opteron CPUs.1.8xfaster.Implies parallel code.2 CPUs vs1 CPU,otherwisesimilarCS 152 L6: Performance UC Regents Fall 2006 © UCBPerformance Measurement(as seen by a CPU designer)Q. Why do we care about After Effect’s performance?A. We want the CPU we are designing to run it well !UC Regents Fall 2006 © UCBCS 152 L6: PerformanceStep 1: Analyze the right measurement!CPU Time:Time the CPU spends running program under measurement.Response Time:Total time: CPU Time + time spent waiting (for disk, I/O, ...).Guides CPU designGuides system design Measuring CPU time (Unix):% time <program name>25.77u 0.72s 0:29.17 90.8%UC Regents Fall 2006 © UCBCS 152 L6: Performance CPU time: Proportional to Instruction CountCPU timeProgramMachine InstructionsProgram∝Q. Static count?(lines of program printout)Or dynamic count? (trace of execution)Rationale: Every additional instruction you execute takes time.Q. How does a architect influence the number of machine instructions needed to run an algorithm?A. Create new instructions:instruction set architect.A. Dynamic.Q. Once ISA is set, who can influence instructioncount?A. Compiler writer,application developer.UC Regents Fall 2006 © UCBCS 152 L6: Performance CPU time: Proportional to Clock PeriodQ. What ultimately limitsan architect’s ability to reduce clock period ?TimeProgramTimeOne Clock Period∝A. Clock-to-Q, setup times.Q. How can architects (not technologists) reduce clock period?A. Shorten the machine’s critical path.Rationale: We measure each instruction’sexecution time in “number of cycles”. By shortening the period for each cycle, we shorten execution time.UC Regents Fall 2006 © UCBCS 152 L6: Performance Completing the performance equationSecondsProgram InstructionsProgram=SecondsCycleWe need all three terms, and only these terms, to compute CPU Time!When is it OK to compare clock rates?What factors make different programs have different CPIs? Instruction mix varies.Cache behavior varies.Branch prediction varies.“CPI” -- The Average Number of Clock Cycles Per Instruction For the Program InstructionCyclesUC Regents Fall 2006 © UCBCS 152 L6: PerformanceConsider our Lab 2 single-cycle CPU ...All instructions take 1 cycle to execute every time they run.CPI of any program running on machine?1.0“average CPI for the program” is a more-useful concept for more complicated machines ...UC Regents Fall 2006 © UCBCS 152 L6: Performance Consider machine with a data cache ... InstructionsProgram=SecondsCycleA program’s load instructions “stride” through every memory address.The cache never “hits”, so every load goes to DRAM (100x slower than loads that go to cache). Thus, the average number of cycles for load instructions is higher for this program. InstructionCyclesThus, the average number of cycles for all instructions is higher for this program.SecondsProgramThus, program takes longer to run!UC Regents Fall 2006 © UCBCS 152 L6: Performance CPI as an analytical tool to guide designMachine CPI(throughput, not latency)5 x 30 + 1 x 20 + 2 x 20 + 2 x 10 + 2 x 20100= 2.7 cycles/instructionProgramInstruction MixWhere program spends its time20/270UC Regents Fall 2006 © UCBCS 152 L6: Performance Amdahl’s Law (of Diminishing Returns)If enhancement “E” makes multiply infinitely fast, but other instructions are unchanged, what is the maximum speedup “S”? Where programspends its timeS =1(post-enhancement %) / 100%= 2.08148%/100%= Attributed to Gene Amdahl -- “Amdahl’s Law”What is the lesson of Amdahl’s Law? Must enhance computers in a balanced way!CS 152 L6: Performance UC Regents Fall 2006 © UCBInvented the “one ISA, many implementations” business model.UC Regents Fall 2006 © UCBCS 152 L6: PerformanceProgramWeWishTo RunOn N CPUsThe program spends 30%of its time running code that can not be recoded to run in parallel.S =1(30 % + (70% / N) ) / 100 %CPUs2 3 4 5∞Speedup1.54 1.85 2.1 2.3


View Full Document

Berkeley COMPSCI 152 - Lecture 6 – Performance and Energy

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 6 – Performance and Energy
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 6 – Performance and Energy and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 6 – Performance and Energy 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?