Unformatted text preview:

CS 152 Computer Architecture and Engineering Lecture 7 Performance 2005 2 8 John Lazzaro www cs berkeley edu lazzaro TAs Ted Hong and David Marquardt www inst eecs berkeley edu cs152 CS 152 L7 Performance UC Regents Spring 2005 UCB Last Time Tips for Teamwork Example 3 members want to do the design one way member number 4 does not agree Solution 1 Voting Fair But what if the loser was technically correct Solution 2 Consensus Keeping in mind the goal correctly working CPU on the board on schedule what option brings the group closer to the goal CS 152 L7 Performance Never lose sight of the goal UC Regents Spring 2005 UCB Today s Lecture Performance Measurement what why how The performance equation Amdahl s law Also news about PlayStation 3 Cell processor How energy limits performance CS 152 L7 Performance UC Regents Spring 2005 UCB Performance Measurement as seen by the customer CS 152 L7 Performance UC Regents Spring 2005 UCB Who sensibly upgrades CPUs often A professional who turns CPU cycles into money and who is cycle limited Artist tool animation video special effects CS 152 L7 Performance UC Regents Spring 2005 UCB How to decide to buy a new machine Measure After Effects execution time on a representative render workload Night flight City map and clouds computed on the fly with fractals CPU intensive Trivial I O CS 152 L7 Performance UC Regents Spring 2005 UCB Interpreting Execution Time Power Book G4 1 25 GHz Execution Time 1265 seconds 1 Performance Execution Time 2 85 renders hour 1 5 GHz PB Y is N times faster than 1 25 GHz PB X N is N Performance Y Performance X Execution Time X Execution Time Y 1 19 PB 1 5 Ghz 3 4 renders hour PB 1 25 2 85 renders hour Does artist productivity really increase CS 152 L7 Performance UC Regents Spring 2005 UCB 2 CPUs Execution Time vs Throughput Execution Time Time for 1 job to complete 2 CPUs vs 1 CPU otherwise similar 1 8x faster What does this imply Throughput of parallel jobs hour completed Assume G5 MP execution time faster because AE does not use both Opteron CPUs Could G5 and Opteron have similar Throughput Why CS 152 L7 Performance UC Regents Spring 2005 UCB Performance Measurement as seen by a CPU designer Q Why do we care about After Effect s performance A We want the CPU we are designing to run it well CS 152 L7 Performance UC Regents Spring 2005 UCB Step 1 Analyze the right measurement Guides CPU design CPU Time Time the CPU spends running program under measurement How to measure CPU time time program name 25 77u 0 72s 0 29 17 90 8 Guides system design CS 152 L7 Performance Response Time Total time CPU Time time spent waiting for disk I O UC Regents Spring 2005 UCB CPU time Proportional to Instruction Count Q Once ISA is set who can influence instruction count A Compiler writer application developer CPU time Program Q Static count lines of program printout Or dynamic count trace of execution A Dynamic Machine Instructions Program Rationale Every additional instruction you execute takes time CS 152 L7 Performance Q What type of computer architect influences the number of instructions a given program needs A Instruction set architect UC Regents Spring 2005 UCB CPU time Proportional to Clock Period Q How can architects not technologists reduce clock period A Shorten the machine critical path Time Program Q What ultimately limits an architect s ability to reduce clock period A Clock to Q setup times Time One Clock Period Rationale We measure each instruction s execution time in number of cycles By shortening the period for each cycle we shorten execution time CS 152 L7 Performance UC Regents Spring 2005 UCB Completing the performance equation What factors make the CPI for a program differ from the underlying CPI of a CPU implementation Seconds Program Cache behavior varies Instruction mix varies Branch prediction varies Instructions Cycles Seconds Program Instruction Cycle We need all three terms and only these terms to compute CPU Time CPI The Average Number of Clock Cycles Per Instruction For the Program When is it OK to compare clock rates CS 152 L7 Performance UC Regents Spring 2005 UCB CPI as an analytical tool to guide design Program Instruction Mix Machine CPI 5 2 Multiply 30 an ch e Load 20 Br or St Lo ad Store 10 th er 2 O M ul AL tip U ly 1 2 Branch 20 Other ALU 20 5 x 30 1 x 20 2 x 20 2 x 10 2 x 20 100 2 7 cycles instruction 7 Branch 15 Load 15 7 CS 152 L7 Performance Multiply 56 Where program spends its time UC Regents Spring 2005 UCB Amdahl s Law of Diminishing Returns Where program spends its time Branch 16 8 Multiply 52 Load 16 8 If enhancement E speeds up multiply but other instructions are unchanged what is the maximum speedup S 1 1 Smax 2 08 48 100 un enhanced 100 Attributed to Gene Amdahl Amdahl s Law What is the lesson of Amdahl s Law Must enhance computers in a balanced way CS 152 L7 Performance UC Regents Spring 2005 UCB Invented the one ISA many implementations business model CS 152 L7 Performance UC Regents Spring 2005 UCB Amdahl s Law in Action Program We Wish To Run On N CPUs CPUs The program spends 30 of its time running code that can not be recoded to run in parallel Serial 30 Parallel 70 2 3 Compute speedup for N 2 3 4 5 and 4 5 Speedup CS 152 L7 Performance UC Regents Spring 2005 UCB A law of diminishing returns Program We Wish To Run On N CPUs S The program spends 30 of its time running code that can not be recoded to run in parallel Serial 30 Parallel 70 S 1 30 70 N 100 2 CPUs Speedup CS 152 L7 Performance 3 CPUs 2 3 4 5 1 54 1 85 2 1 2 3 3 3 UC Regents Spring 2005 UCB Final thoughts Performance Equation Seconds Program Goal is to optimize execution time not individual equation terms CS 152 L7 Performance Instructions Program Machines are optimized with respect to program workloads Cycles Instruction Seconds Cycle The CPI of the program Reflects the program s instruction mix Clock period Optimize jointly with machine CPI UC Regents Spring 2005 UCB Administrivia Upcoming deadlines Friday 2 11 Xilinx Checkoff 12 1 119 Cory For 61 c students 150 Lab Lecture 4 1 2 PM 125 Cory Monday 2 14 Lab 2 final report due via the submit program 11 59 PM Lab 3 now available on the web site Thursday 2 17 At 11 59 PM via email Lab 2 peer evaluations and Lab 3 preliminary design document due More details on Lab 3 on Thursday CS 152 L7 Performance UC Regents Spring 2005 UCB News from ISSCC Int l Solid State Circuits Conference Every February at the SF Marriot CS 152 L7 Performance UC Regents Spring 2005 UCB Cell The


View Full Document

Berkeley COMPSCI 152 - Lecture 7 – Performance

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 7 – Performance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 7 – Performance and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?