inst eecs berkeley edu cs61c CS61C Machine Structures Lecture 29 Performance Parallel Intro 2007 8 14 Scott Beamer Instructor Paper Battery Developed by Researchers at Rensselaer www bbc co uk CS61C L29 Performance Parallel 1 Beamer Summer 2007 U Last time Magnetic Disks continue rapid advance 60 yr capacity 40 yr bandwidth slow on seek rotation improvements MB improving 100 yr Designs to fit high volume form factor PMR a fundamental new technology breaks through barrier RAID Higher performance with more disk arms per Adds option for small of extra disks Can nest RAID levels Today RAID is tens billion dollar industry 80 nonPC disks sold in RAIDs started at Cal CS61C L29 Performance Parallel 2 Beamer Summer 2007 U Peer Instruction 1 RAID 1 mirror and 5 rotated parity help with performance and availability 2 RAID 1 has higher cost than RAID 5 3 Small writes on RAID 5 are slower than on RAID 1 CS61C L29 Performance Parallel 3 0 1 2 3 4 5 6 7 ABC FFF FFT FTF FTT TFF TFT TTF TTT Beamer Summer 2007 U Peer Instruction Answer 1 All RAID 0 5 helps with performance only RAID0 doesn t help availability TRUE 2 Surely Must buy 2x disks rather than 1 25x from diagram in practice even less TRUE 3 RAID5 2R 2W vs RAID1 2W Latency worse throughput writes better TRUE 1 RAID 1 mirror and 5 rotated parity help with performance and availability 2 RAID 1 has higher cost than RAID 5 3 Small writes on RAID 5 are slower than on RAID 1 CS61C L29 Performance Parallel 4 0 1 2 3 4 5 6 7 ABC FFF FFT FTF FTT TFF TFT TTF TTT Beamer Summer 2007 U Why Performance Faster is better Purchasing Perspective given a collection of machines or upgrade options which has the best performance least cost best performance cost Computer Designer Perspective faced with design options which has the best performance improvement least cost best performance cost All require basis for comparison and metric for evaluation Solid metrics lead to solid progress CS61C L29 Performance Parallel 5 Beamer Summer 2007 U Two Notions of Performance Plane Boeing 747 DC to Top Passen Throughput Paris Speed gers pmph 6 5 610 470 286 700 hours mph BAD Sud 3 1350 132 Concorde hours mph Which has higher performance 178 200 Interested in time to deliver 100 passengers Interested in delivering as many passengers per day as possible In a computer time for one task called Response Time or Execution Time In a computer tasks per unit time called Throughput or Bandwidth CS61C L29 Performance Parallel 6 Beamer Summer 2007 U Definitions Performance is in units of things per sec bigger is better If we are primarily concerned with response time performance x 1 execution time x F ast is n times faster than S low means performance F n execution time S performance S CS61C L29 Performance Parallel 7 execution time F Beamer Summer 2007 U Example of Response Time v Throughput Time of Concorde vs Boeing 747 Concord is 6 5 hours 3 hours 2 2 times faster Throughput of Boeing vs Concorde Boeing 747 286 700 pmph 178 200 pmph 1 6 times faster Boeing is 1 6 times 60 faster in terms of throughput Concord is 2 2 times 120 faster in terms of flying time response time We will focus primarily on response time CS61C L29 Performance Parallel 8 Beamer Summer 2007 U Words Words Words Will try to stick to n times faster its less confusing than m faster As faster means both decreased execution time and increased performance to reduce confusion we will and you should use improve execution time or improve performance CS61C L29 Performance Parallel 9 Beamer Summer 2007 U What is Time Straightforward definition of time Total time to complete a task including disk accesses memory accesses I O activities operating system overhead real time response time or time elapsed Alternative just time processor CPU is working only on your program since multiple processes running at same time CPU execution time or CPU time Often divided into system CPU time in OS and user CPU time in user program CS61C L29 Performance Parallel 10 Beamer Summer 2007 U How to Measure Time Real Time Actual time elapsed CPU Time Computers constructed using a clock that runs at a constant rate and determines when events take place in the hardware These discrete time intervals called clock cycles or informally clocks or cycles Length of clock period clock cycle time e g 2 nanoseconds or 2 ns and clock rate e g 500 megahertz or 500 MHz which is the inverse of the clock period use these CS61C L29 Performance Parallel 11 Beamer Summer 2007 U Measuring Time using Clock Cycles 1 2 CPU execution time for a program Clock Cycles for a program x Clock Period or Clock Cycles for a program Clock Rate CS61C L29 Performance Parallel 12 Beamer Summer 2007 U Measuring Time using Clock Cycles 2 2 One way to define clock cycles Clock Cycles for program Instructions for a program called Instruction Count x Average Clock cycles Per Instruction abbreviated CPI CPI one way to compare two machines with same instruction set since Instruction Count would be the same CS61C L29 Performance Parallel 13 Beamer Summer 2007 U Performance Calculation 1 2 CPU execution time for program Clock Cycles for program x Clock Cycle Time Substituting for clock cycles CPU execution time for program Instruction Count x CPI x Clock Cycle Time Instruction Count x CPI x Clock Cycle Time CS61C L29 Performance Parallel 14 Beamer Summer 2007 U Performance Calculation 2 2 CPU time Instructions x Cycles Program Instruction CPU time Instructions x Cycles Program Cycle x Seconds Instruction CPU time Instructions x Cycles Program CPU time Seconds x Seconds Cycle x Seconds Instruction Cycle Program Product of all 3 terms if missing a term can t predict time the real measure of performance CS61C L29 Performance Parallel 15 Beamer Summer 2007 U How Calculate the 3 Components Clock Cycle Time in specification of computer Clock Rate in advertisements Instruction Count Count instructions in loop of small program Use simulator to count instructions Hardware counter in spec register Pentium II III 4 CPI Calculate Execution Time Clock cycle time Instruction Count Hardware counter in special register PII III 4 CS61C L29 Performance Parallel 16 Beamer Summer 2007 U Calculating CPI Another Way First calculate CPI for each individual instruction add sub and etc Next calculate frequency of each individual instruction Finally multiply these two for each instruction and add them up to get final CPI the weighted sum CS61C L29 Performance Parallel 17 Beamer Summer 2007 U Example RISC processor
View Full Document
Unlocking...