Unformatted text preview:

CS 152 Computer Architecture and Engineering Lecture 6 Performance and Energy 2006 9 14 John Lazzaro www cs berkeley edu lazzaro TAs Udam Saini and Jue Sun www inst eecs berkeley edu cs152 CS 152 L6 Performance UC Regents Fall 2006 UCB Last Time Processor Timing T2 might be the critical worstcase delay path T1 T2 x g a b c d e f If d going 0 to 1 switches x 0 to 1 delay is T1 If a going 0 to 1 switches x 0 to 1 delay is T2 Would you be surprised if T1 T2 Why CS 152 L6 Performance UC Regents Fall 2006 UCB Today s Lecture Performance Measurement what why how The performance equation Amdahl s law How energy limits performance CS 152 L6 Performance UC Regents Fall 2006 UCB Performance Measurement as seen by the customer CS 152 L6 Performance UC Regents Fall 2006 UCB Who sensibly upgrades CPUs often A professional who turns CPU cycles into money and who is cycle limited Artist tool animation video special effects CS 152 L6 Performance UC Regents Fall 2006 UCB How to decide to buy a new machine Measure After Effects execution time on a representative render workload Night flight City map and clouds computed on the fly with fractals CPU intensive Trivial I O still shot from the movie CS 152 L6 Performance UC Regents Fall 2006 UCB Interpreting Execution Time Power Book G4 1 25 GHz Performance Executio n Time 1265 seconds 1 2 85 Execution Time renders hour 1 5 GHz PB Y is N times faster than 1 25 GHz PB X N is Performance Y Execution Time X N 1 Performance X Execution Time Y 19 PB 1 5 Ghz 3 4 renders hour PB 1 25 2 85 renders hour Might make the difference in meeting a CS 152 L6 Performance UC Regents Fall 2006 UCB 2 CPUs Execution Time vs Throughput Execution Time Time for 1 job to complete 2 CPUs vs 1 CPU otherwise similar 1 8x faster Implie s parall el code Throughput of parallel jobs hour completed Assume G5 MP execution time faster because AE does G5 and may not use Opteron both Opteron very well have CPUs CS 152 L6 Performance throughput UC Regents Fall 2006 UCB Performance Measurement as seen by a CPU designer Q Why do we care about After Effect s performance A We want the CPU we are designing to run it well CS 152 L6 Performance UC Regents Fall 2006 UCB Step 1 Analyze the right measurement Guides CPU design CPU Time Time the CPU spends running program under measurement Measuring CPU time time program name Unix 25 77u 0 72s 0 29 17 90 8 Guides syste m design CS 152 L6 Performance Response Time Total time CPU Time time spent waiting for disk I O UC Regents Fall 2006 UCB CPU time Proportional to Instruction Count Q Once ISA is set who can influence instruction count A Compiler writer application developer CPU time Program Q Static count lines of program printout Or dynamic count trace of execution A Dynamic Machine Instructions Program Rationale Every additional instruction you execute takes time CS 152 L6 Performance Q How does a architect influence the number of machine instructions needed to run an algorithm A Create new instructions instruction set UC Regents Fall 2006 UCB CPU time Proportional to Clock Period Q How can architects not technologists reduce clock A Shorten the period machine s critical path Time Program Q What ultimately limits an architect s ability to reduce A Clock to Q setup clock period times Time One Clock Period Rationale We measure each instruction s execution time in number of cycles By shortening the period for each cycle we shorten execution time CS 152 L6 Performance UC Regents Fall 2006 UCB Completing the performance equation What factors make different programs have different CPIs Seconds Program Instructions Program We need all three terms and only these terms to compute CPU Time Cache behavior varies Instruction mix varies Branch prediction varies Cycles Seconds Instruction Cycle CPI The Average Number of Clock Cycles Per Instruction For the Program When is it OK to compare clock CS 152 L6 Performance UC Regents Fall 2006 UCB Consider our Lab 2 single cycle CPU All instructions take 1 cycle to execute every time they run CPI of any program running on machine 1 0 CS 152 L6 Performance average CPI for the program is a more useful concept for more UC Regents Fall 2006 UCB Consider machine with a data cache The cache never A program s hits load instructions so every load goes stride through to DRAM 100x every memory slower than loads address that go to cache Thus the average number of cycles for load instructions is higher for this program Thus the average number of cycles for all instructions is higher for this program Seconds Program Instructions Cycles Seconds Program Instruction Cycle Thus program takes longer to run CS 152 L6 Performance UC Regents Fall 2006 UCB CPI as an analytical tool to guide design Program Instruction Mix Machine CPI throughput not latency 5 x 30 1 x 20 2 x 20 2 x 10 2 x 20 100 2 7 cycles instruction CS 152 L6 Performance 20 2 7 0 Where progra m spends its time UC Regents Fall 2006 UCB Amdahl s Law of Diminishing Returns If enhancement E makes multiply infinitely fast but other instructions are unchanged what is the maximum speedup S Where program spends its time S 1 post enhancement 100 1 48 100 2 08 Attributed to Gene Amdahl Amdahl s Law What is the lesson of Amdahl s Law Must enhance computers in a balanced way CS 152 L6 Performance UC Regents Fall 2006 UCB Invented the one ISA many implementations CS 152 L6 Performance UC Regents Fall 2006 UCB Amdahl s Law in Action The program spends 30 of its time running code that can not be recoded to run in parallel Program We Wish To Run On N CPUs S S 1 30 70 N 100 2 CPUs Speedup CS 152 L6 Performance 3 CPUs 2 3 4 5 1 54 1 85 2 1 2 3 3 3 UC Regents Fall 2006 UCB Real world 2006 2 CPUs vs 4 CPUs 20 in iMac Core Duo 2 2 16 GHz 1500 Mac Pro 2 Dual Core Xeons 2 66 GHz 3200 w 20 inch display CS 152 L6 Performance UC Regents Fall 2006 UCB Real world 2006 2 CPUs vs 4 CPUs 2 cores on one die Amdahl s Law Real World Legacy Code Issues in action Simple audio and video tasks easier to parallelize Caveat Mac Pro CPUs are server class and have 4 cores on two dies architectural advantages better I O ECC DRAM ETC CS 152 L6 Performance Source MACWORLD ZIPing a file very difficult to paralleliz UC Regents Fall 2006 UCB Final thoughts Performance Equation Seconds Program Goal is to optimize executio n time not individu al equation terms CS 152 L6 Performance Instructions Program Machines are optimize d with respect to program workload s Cycles Instruction The …


View Full Document

Berkeley COMPSCI 152 - Lecture 6 – Performance and Energy

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 6 – Performance and Energy and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 6 – Performance and Energy and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?