CS 61C Great Ideas in Computer Architecture Machine Structures Performance Instructors Randy H Katz David A Patterson http inst eecs Berkeley edu cs61c sp11 01 14 2019 Spring 2011 Lecture 10 1 01 14 2019 Spring 2011 Lecture 10 2 New School Machine Structures It s a bit more complicated Software Parallel Requests Assigned to computer e g Search Katz Parallel Threads Assigned to core e g Lookup Ads Hardware Warehouse Scale Computer Harness How do Parallelism we know Smart Phone Achieve High Performance Computer Parallel Instructions 1 instruction one time e g 5 pipelined instructions Parallel Data 1 data item one time e g Add of 4 pairs of words Hardware descriptions Memory Core Cache Input Output Instruction Unit s Core Functional Unit s A0 B0 A1 B1 A2 B2 A3 B3 Main Memory All gates one time 01 14 2019 Core Logic Gates Spring 2011 Lecture 10 3 Agenda Defining Performance Administrivia Workloads and Benchmarks Technology Break Measuring Performance Summary 01 14 2019 Spring 2011 Lecture 10 4 Agenda Defining Performance Administrivia Workloads and Benchmarks Technology Break Measuring Performance Summary 01 14 2019 Spring 2011 Lecture 10 5 What is Performance Latency or response time or execution time Time to complete one task Bandwidth or throughput Tasks completed per unit time 01 14 2019 Spring 2011 Lecture 10 6 Running Systems to 100 Utilization Implication of the graph at the right Service Time aka Latency or Responsiveness Can you explain why this happens Knee 100 Utilization 01 14 2019 Spring 2011 Lecture 10 7 Student Roulette The Iron Law of Queues aka Little s Law L lW 01 14 2019 Average number of customers in system L average interarrival rate l x average service time W Spring 2011 Lecture 10 8 Cloud Performance Why Application Latency Matters Key figure of merit application responsiveness Longer the delay the fewer the user clicks the less the user happiness and the lower the revenue per user 01 14 2019 Spring 2011 Lecture 10 9 Google Instant Search Instant Efficiency Typical search takes 24 seconds Google s search algorithm is only 300 ms of this It s not search as you type but search before you type We can predict what you are likely to type and give you those results in real time 01 14 2019 Spring 2011 Lecture 10 10 Defining CPU Performance What does it mean to say X is faster than Y Ferrari vs School Bus 2009 Ferrari 599 GTB 2 passengers 11 1 secs in quarter mile 2009 Type D school bus 54 passengers quarter mile time http www youtube com watch v KwyCoQuhUNA Response Time Latency e g time to travel mile Throughput Bandwidth e g passenger mi in 1 hour 01 14 2019 Spring 2011 Lecture 10 11 Defining Relative CPU Performance PerformanceX 1 Program Execution TimeX PerformanceX PerformanceY 1 Execution TimeX 1 Execution Timey Execution TimeY Execution TimeX Computer X is N times faster than Computer Y PerformanceX PerformanceY N or Execution TimeY Execution TimeX N Bus is to Ferrari as 12 is to 11 1 Ferrari is 1 08 times faster than the bus 01 14 2019 Spring 2011 Lecture 10 12 Measuring CPU Performance Computers use a clock to determine when events takes place within hardware Clock cycles discrete time intervals aka clocks cycles clock periods clock ticks Clock rate or clock frequency clock cycles per second inverse of clock cycle time 3 GigaHertz clock rate clock cycle time 1 3x109 seconds clock cycle time 333 picoseconds ps 01 14 2019 Spring 2011 Lecture 10 13 CPU Performance Factors To distinguish between processor time and I O CPU time is time spent in processor CPU Time Program Clock Cycles Program x Clock Cycle Time Or CPU Time Program Clock Cycles Program Clock Rate 01 14 2019 Spring 2011 Lecture 10 14 CPU Performance Factors But a program executes instructions CPU Time Program Clock Cycles Program x Clock Cycle Time Instructions Program x Average Clock Cycles Instruction x Clock Cycle Time 1st term called Instruction Count 2nd term abbreviated CPI for average Clock Cycles Per Instruction 3rd term is 1 Clock rate 01 14 2019 Spring 2011 Lecture 10 15 Restating Performance Equation Time Seconds Program Instructions Seconds Program Cycle 01 14 2019 Clock cycles Instruction Spring 2011 Lecture 10 Clock 16 What Affects Each Component Instruction Count CPI Clock Rate Hardware or software component Algorithm Affects What Programming Language Compiler Instruction Set Architecture 01 14 2019 Spring 2011 Lecture 10 17 Student Roulette Peer Instruction Question Computer A clock cycle time 250 ps CPIA 2 Computer B clock cycle time 500 ps CPIB 1 2 Assume A and B have same instruction set Which statement is true Red Computer A is 1 2 times faster than B Orange Computer A is 4 0 times faster than B Green Computer B is 1 7 times faster than A Yellow Computer B is 3 4 times faster than A Pink None of the above 01 14 2019 Spring 2011 Lecture 10 19 Agenda Defining Performance Administrivia Workloads and Benchmarks Technology Break Measuring Performance Summary 01 14 2019 Spring 2011 Lecture 10 21 Administrivia Lab 5 posted Project 2 1 Due Sunday 11 59 59 HW 4 Due Sunday 11 59 59 Midterm in less than three weeks No discussion during exam week TA Review Su Mar 6 2 5 PM 2050 VLSB Exam Tu Mar 8 6 9 PM 145 155 Dwinelle Small number of special consideration cases due to class conflicts etc contact Dave or Randy 01 14 2019 Spring 2011 Lecture 7 22 Agenda Defining Performance Administrivia Workloads and Benchmarks Technology Break Measuring Performance Summary 01 14 2019 Spring 2011 Lecture 10 23 Workload and Benchmark Workload Set of programs run on a computer Actual collection of applications run or made from real programs to approximate such a mix Specifies both programs and relative frequencies Benchmark Program selected for use in comparing computer performance Benchmarks form a workload Usually standardized so that many use them 01 14 2019 Spring 2011 Lecture 10 24 SPEC System Performance Evaluation Cooperative Computer Vendor cooperative for benchmarks started in 1989 SPECCPU2006 12 Integer Programs 17 Floating Point Programs Often turn into number where bigger is faster SPECratio reference execution time on old reference computer divide by execution time on new computer to get an effective speed up 01 14 2019 Spring 2011 Lecture 10 25 SPECINT2006 on AMD Barcelona Description Interpreted string processing Block sorting compression GNU C compiler Combinatorial optimization Go game Search gene sequence Chess game Quantum computer simulation Video compression Discrete event simulation
View Full Document
Unlocking...