Unformatted text preview:

Last time inst eecs berkeley edu cs61c CS61C Machine Structures Lecture 29 Performance Parallel Intro Magnetic Disks continue rapid advance 60 yr capacity 40 yr bandwidth slow on seek rotation improvements MB improving 100 yr Designs to fit high volume form factor PMR a fundamental new technology 2007 8 14 Scott Beamer Instructor breaks through barrier RAID Paper Battery Developed by Researchers at Rensselaer Higher performance with more disk arms per Adds option for small of extra disks Can nest RAID levels Today RAID is tens billion dollar industry 80 nonPC disks sold in RAIDs started at Cal www bbc co uk CS61C L29 Performance Parallel 1 Beamer Summer 2007 UCB Peer Instruction CS61C L29 Performance Parallel 2 Beamer Summer 2007 UCB Why Performance Faster is better Purchasing Perspective given a collection of machines or upgrade options which has the best performance least cost best performance cost 1 RAID 1 mirror and 5 rotated parity help with performance and availability 2 RAID 1 has higher cost than RAID 5 3 Small writes on RAID 5 are slower than on RAID 1 CS61C L29 Performance Parallel 3 0 1 2 3 4 5 6 7 ABC FFF FFT FTF FTT TFF TFT TTF TTT Beamer Summer 2007 UCB Two Notions of Performance Plane Boeing 747 best performance improvement least cost best performance cost All require basis for comparison and metric for evaluation Solid metrics lead to solid progress CS61C L29 Performance Parallel 5 178 200 Interested in time to deliver 100 passengers Interested in delivering as many passengers per day as possible Performance is in units of things per sec bigger is better If we are primarily concerned with response time performance x 1 execution time x F ast is n times faster than S low means In a computer time for one task called performance F Response Time or Execution Time n In a computer tasks per unit time called Beamer Summer 2007 UCB execution time S performance S Throughput or Bandwidth CS61C L29 Performance Parallel 6 Beamer Summer 2007 UCB Definitions DC to Top Passen Throughput Paris Speed gers pmph 6 5 610 470 286 700 hours mph BAD Sud 3 1350 132 Concorde hours mph Which has higher performance Computer Designer Perspective faced with design options which has the CS61C L29 Performance Parallel 7 execution time F Beamer Summer 2007 UCB Example of Response Time v Throughput Time of Concorde vs Boeing 747 Will try to stick to n times faster its less confusing than m faster Concord is 6 5 hours 3 hours 2 2 times faster Throughput of Boeing vs Concorde Boeing 747 286 700 pmph 178 200 pmph 1 6 times faster Boeing is 1 6 times 60 faster in terms of throughput Concord is 2 2 times 120 faster in terms of flying time response time We will focus primarily on response time CS61C L29 Performance Parallel 8 Beamer Summer 2007 UCB What is Time Straightforward definition of time As faster means both decreased execution time and increased performance to reduce confusion we will and you should use improve execution time or improve performance CS61C L29 Performance Parallel 9 Beamer Summer 2007 UCB How to Measure Time Real Time Actual time elapsed Total time to complete a task including disk accesses memory accesses I O activities operating system overhead real time response time or elapsed time Alternative just time processor CPU is working only on your program since multiple processes running at same time CPU execution time or CPU time Often divided into system CPU time in OS and user CPU time in user program CS61C L29 Performance Parallel 10 Words Words Words Beamer Summer 2007 UCB CPU Time Computers constructed using a clock that runs at a constant rate and determines when events take place in the hardware These discrete time intervals called clock cycles or informally clocks or cycles Length of clock period clock cycle time e g 2 nanoseconds or 2 ns and clock rate e g 500 megahertz or 500 MHz which is the inverse of the clock period use these CS61C L29 Performance Parallel 11 Beamer Summer 2007 UCB Measuring Time using Clock Cycles 1 2 Measuring Time using Clock Cycles 2 2 CPU execution time for a program One way to define clock cycles Clock Cycles for a program x Clock Period Clock Cycles for program Clock Cycles for a program Clock Rate x Average Clock cycles Per Instruction abbreviated CPI Instructions for a program called Instruction Count or CPI one way to compare two machines with same instruction set since Instruction Count would be the same CS61C L29 Performance Parallel 12 Beamer Summer 2007 UCB CS61C L29 Performance Parallel 13 Beamer Summer 2007 UCB Performance Calculation 1 2 Performance Calculation 2 2 CPU execution time for program Clock Cycles for program x Clock Cycle Time CPU time Instructions x Cycles Substituting for clock cycles CPU time Instructions x Cycles Program CPU execution time for program Instruction Count x CPI x Clock Cycle Time Instruction Count x CPI x Clock Cycle Time Program x Seconds Instruction Cycle x Seconds Instruction Cycle CPU time Instructions x Cycles x Seconds Program Instruction Cycle CPU time Seconds Program Product of all 3 terms if missing a term can t predict time the real measure of performance CS61C L29 Performance Parallel 14 Beamer Summer 2007 UCB How Calculate the 3 Components CS61C L29 Performance Parallel 15 Beamer Summer 2007 UCB Calculating CPI Another Way Clock Cycle Time in specification of computer Clock Rate in advertisements First calculate CPI for each individual instruction add sub and etc Instruction Count Next calculate frequency of each individual instruction Count instructions in loop of small program Use simulator to count instructions Hardware counter in spec register Pentium II III 4 Finally multiply these two for each instruction and add them up to get final CPI the weighted sum CPI Calculate Execution Time Clock cycle time Instruction Count Hardware counter in special register PII III 4 CS61C L29 Performance Parallel 16 Beamer Summer 2007 UCB Freqi 50 20 10 20 Instruction Mix CPIi 1 5 3 2 Prod Time 5 23 1 0 45 3 14 4 18 2 2 Where time spent What if Branch instructions twice as fast CS61C L29 Performance Parallel 18 Beamer Summer 2007 UCB What Programs Measure for Comparison Example RISC processor Op ALU Load Store Branch CS61C L29 Performance Parallel 17 Beamer Summer 2007 UCB Ideally run typical programs with typical input before purchase or before even build machine Called a workload For example Engineer uses compiler spreadsheet Author uses word processor drawing program compression software In some


View Full Document

Berkeley COMPSCI 61C - Lecture 29 Performance & Parallel Intro

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 29 Performance & Parallel Intro and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 29 Performance & Parallel Intro and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?