CS152 Computer Architecture and Engineering Lecture 9 Performance 2004 09 28 Dave Patterson www cs berkeley edu patterson John Lazzaro www cs berkeley edu lazzaro www inst eecs berkeley edu cs152 CS 152 L09 Performance UC Regents Fall 2004 UCB 1 Last Time Microcode Multi Cycle Microprogramming sequencer control Inputs datapath control Code ROM microinstruction micro PC Opcode sequencer fetch dispatch sequential Dispatch ROM CS 152 L09 Performance Decode Decode To DataPath UC Regents Fall 2004 UCB 2 Today s Lecture Performance Measurement what why how The performance equation Amdahl s law How energy limits performance CS 152 L09 Performance UC Regents Fall 2004 UCB 3 Performance Measurement as seen by the customer CS 152 L09 Performance UC Regents Fall 2004 UCB 4 Who sensibly upgrades CPUs often A professional who turns CPU cycles into money and who is cycle limited Artist tool animation video special effects CS 152 L09 Performance UC Regents Fall 2004 UCB 5 How to decide to buy a new machine Measure After Effects execution time on a representative render workload Night flight City map and clouds computed on the fly with fractals CPU intensive Trivial I O CS 152 L09 Performance UC Regents Fall 2004 UCB 6 Interpreting Execution Time Power Book G4 1 25 GHz Execution Time 1265 seconds 1 Performance Execution Time 2 85 renders hour 1 5 GHz PB Y is N times faster than 1 25 GHz PB X N is N Performance Y Performance X Execution Time X Execution Time Y 1 19 PB 1 5 Ghz 3 4 renders hour PB 1 25 2 85 renders hour Does artist productivity really increase CS 152 L09 Performance UC Regents Fall 2004 UCB 7 2 CPUs Execution Time vs Throughput Execution Time Time for 1 job to complete 2 CPUs vs 1 CPU otherwise similar 1 8x faster What does this imply Throughput jobs hour completed not serialized Assume G5 MP execution time faster because AE does not use both Opteron CPUs Could G5 and Opteron have similar Throughput Why CS 152 L09 Performance UC Regents Fall 2004 UCB 8 Performance Measurement as seen by a CPU designer Q Why do we care about After Effect s performance A We want the CPU we are designing to run it well CS 152 L09 Performance UC Regents Fall 2004 UCB 9 Step 1 Analyze the right measurement Guides CPU design CPU Time Time the CPU spends running program under measurement How do designers use these two numbers Guides system design How to measure CPU time time program name 25 77u 0 72s 0 29 17 90 8 Response Time Total time CPU Time time spent waiting for disk I O CS 152 L09 Performance UC Regents Fall 2004 UCB 10 Administrivia Adjust Class Time We have permission to stay in this room past 12 30 Does anyone have a class that starts 12 40 Class time options all sharp time A Lecture from 11 10 to 12 30 B Lecture from 11 15 to 12 35 C Lecture from 11 20 to 12 40 CS 152 L09 Performance UC Regents Fall 2004 UCB 11 Administrivia Mid Term is Coming Mid term Tuesday 10 12 5 30 8 30 PM 101 Morgan No class on Tuesday After exam Pizza at LaVal s on us Mid term review session Sunday 10 10 7 9 PM 306 Soda CS 152 L09 Performance UC Regents Fall 2004 UCB 12 Administrivia This Week s Deadlines Homework 2 due 9 29 tomorrow 283 Soda in CS 152 box at 5 PM Lab 2 Xilinx demo on Friday 10 1 Lab 2 due Monday 10 4 11 59 PM On Tuesday 10 5 onto the Pipelining Lab CS 152 L09 Performance UC Regents Fall 2004 UCB 13 CPU time Proportional to Instruction Count Q Once ISA is set who can influence instruction count A Compiler writer application developer CPU time Program Q Static count lines of program printout Or dynamic count trace of execution A Dynamic Machine Instructions Program Rationale Every additional instruction you execute takes time CS 152 L09 Performance Q What type of computer architect influences the number of instructions a given program needs A Instruction set architect UC Regents Fall 2004 UCB 14 CPU time Proportional to Clock Period Q How can architects not technologists reduce clock period A Shorten the machine critical path Time Program Q What ultimately limits an architect s ability to reduce clock period A Clock to Q setup times Time One Clock Period Rationale We measure each instruction s execution time in number of cycles By shortening the period for each cycle we shorten execution time CS 152 L09 Performance UC Regents Fall 2004 UCB 15 Completing the performance equation What factors make the CPI for a program differ from the underlying CPI of a CPU implementation Seconds Program Cache behavior varies Instruction mix varies Branch prediction varies Instructions Cycles Seconds Program Instruction Cycle We need all three terms and only these terms to compute CPU Time CPI The Average Number of Clock Cycles Per Instruction For the Program When is it OK to compare clock rates CS 152 L09 Performance UC Regents Fall 2004 UCB 16 CPI as an analytical tool to guide design Program Instruction Mix Machine CPI 5 2 nc h Br a re St o d Lo a er AL U 2 Multiply 30 Store 10 O th M ul tip ly 1 2 Branch 20 Load 20 Other ALU 20 5 x 30 1 x 20 2 x 20 2 x 10 2 x 20 Branch 100 15 7 2 7 cycles instruction Q We lower machine multiply Load 15 CPI but program runs slower 7 What mistake s did we make CS 152 L09 Performance Multiply 56 Where program spends its time UC Regents Fall 2004 UCB 17 Amdahl s Law of Diminishing Returns Where program spends its time Branch 17 8 Multiply 50 Load 17 8 If enhancement E speeds up multiply but other instructions are unchanged what is the maximum speedup S 1 1 Smax 2 1 50 100 1 affected 100 Attributed to Gene Amdahl Amdahl s Law What is the lesson of Amdahl s Law Must enhance computers in a balanced way CS 152 L09 Performance UC Regents Fall 2004 UCB 18 Peer Instruction Amdahl s Law Program We Wish To Run On N CPUs The program spends 30 of its time running code that can not be recoded to run in parallel Serial 30 CPUs Parallel 70 2 Compute speedup for N 2 3 4 5 and 3 4 5 Speedup CS 152 L09 Performance UC Regents Fall 2004 UCB 19 Peer Instruction Amdahl s Law Program We Wish To Run On N CPUs Serial 30 Parallel 70 The program spends 30 of its time in serial code Compute speedup for N 2 3 4 5 and S 1 S 1 30 70 N 100 2 CPUs 2 3 4 5 1 54 1 85 2 1 2 3 3 3 CPUs Speedup 3 CS 152 L09 Performance UC Regents Fall 2004 UCB 20 Final thoughts Performance Equation Seconds Program Instructions Program Goal is to optimize execution time not individual …
View Full Document
Unlocking...