PowerPoint Presentation“Last time…”Peer InstructionPeer Instruction AnswerWhy Performance? Faster is better!Two Notions of “Performance”DefinitionsExample of Response Time v. ThroughputWords, Words, Words…What is Time?How to Measure Time?Measuring Time using Clock Cycles (1/2)Measuring Time using Clock Cycles (2/2)Performance Calculation (1/2)Performance Calculation (2/2)How Calculate the 3 Components?Calculating CPI Another WayExample (RISC processor)What Programs Measure for Comparison?BenchmarksExample Standardized Benchmarks (1/2)Example Standardized Benchmarks (2/2)Another BenchmarkPerformance Evaluation: An Aside DemoSlide 25Peer Instruction Answers“And in conclusion…”AdministriviaBig Problems Show Need for ParallelWhat Can We Do?Let’s Put Many CPUs Together!Performance RequirementsRecent History of Parallel ComputingCurrent Champions (June 2007)The Future of ParallelismDistributed Computing ThemesDistributed Computing ChallengesThings to Worry About: Parallelizing CodeBut… What About Overhead?Peer Instruction of AssumptionsSlide 41SummaryA New Hope: Google’s MapReduceMapReduce Programming ModelMapReduce Code ExampleMapReduce Example DiagramMapReduce Advantages/DisadvantagesCS61C L29 Performance & Parallel (1)Beamer, Summer 2007 © UCBScott Beamer, Instructorinst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #29 Performance & Parallel Intro2007-8-14“Paper” Battery Developed by Researchers at Rensselaerwww.bbc.co.ukCS61C L29 Performance & Parallel (2)Beamer, Summer 2007 © UCB“Last time…”•Magnetic Disks continue rapid advance: 60%/yr capacity, 40%/yr bandwidth, slow on seek, rotation improvements, MB/$ improving 100%/yr?•Designs to fit high volume form factor•PMR a fundamental new technologybreaks through barrier•RAID •Higher performance with more disk arms per $•Adds option for small # of extra disks•Can nest RAID levels•Today RAID is > tens-billion dollar industry, 80% nonPC disks sold in RAIDs,started at CalCS61C L29 Performance & Parallel (3)Beamer, Summer 2007 © UCBPeer Instruction1. RAID 1 (mirror) and 5 (rotated parity) help with performance and availability2. RAID 1 has higher cost than RAID 53. Small writes on RAID 5 are slower than on RAID 1 ABC0: FFF1: FFT2: FTF3: FTT4: TFF5: TFT6: TTF7: TTTCS61C L29 Performance & Parallel (4)Beamer, Summer 2007 © UCBPeer Instruction Answer1. RAID 1 (mirror) and 5 (rotated parity) help with performance and availability2. RAID 1 has higher cost than RAID 53. Small writes on RAID 5 are slower than on RAID 1 ABC0: FFF1: FFT2: FTF3: FTT4: TFF5: TFT6: TTF7: TTT1. All RAID (0-5) helps with performance, only RAID0 doesn’t help availability. TRUE2. Surely! Must buy 2x disks rather than 1.25x (from diagram, in practice even less) TRUE3. RAID5 (2R,2W) vs. RAID1 (2W). Latency worse, throughput (|| writes) better. TRUECS61C L29 Performance & Parallel (5)Beamer, Summer 2007 © UCBWhy Performance? Faster is better!•Purchasing Perspective: given a collection of machines (or upgrade options), which has the best performance ?least cost ?best performance / cost ?•Computer Designer Perspective: faced with design options, which has the best performance improvement ?least cost ?best performance / cost ?•All require basis for comparison and metric for evaluation!•Solid metrics lead to solid progress!CS61C L29 Performance & Parallel (6)Beamer, Summer 2007 © UCBTwo Notions of “Performance”PlaneBoeing 747BAD/Sud ConcordeTopSpeedDC to ParisPassen-gersThroughput (pmph)610 mph6.5 hours470 286,7001350 mph3 hours132 178,200•Which has higher performance?•Interested in time to deliver 100 passengers?•Interested in delivering as many passengers per day as possible?•In a computer, time for one task calledResponse Time or Execution Time•In a computer, tasks per unit time calledThroughput or BandwidthCS61C L29 Performance & Parallel (7)Beamer, Summer 2007 © UCBDefinitions•Performance is in units of things per sec•bigger is better•If we are primarily concerned with response time•performance(x) = 1 execution_time(x)" F(ast) is n times faster than S(low) " means… performance(F) execution_time(S)n = = performance(S) execution_time(F)CS61C L29 Performance & Parallel (8)Beamer, Summer 2007 © UCBExample of Response Time v. Throughput•Time of Concorde vs. Boeing 747?•Concord is 6.5 hours / 3 hours = 2.2 times faster•Throughput of Boeing vs. Concorde?•Boeing 747: 286,700 pmph / 178,200 pmph = 1.6 times faster•Boeing is 1.6 times (“60%”) faster in terms of throughput•Concord is 2.2 times (“120%”) faster in terms of flying time (response time)We will focus primarily on response time.CS61C L29 Performance & Parallel (9)Beamer, Summer 2007 © UCBWords, Words, Words…•Will (try to) stick to “n times faster”; its less confusing than “m % faster”•As faster means both decreased execution time and increased performance, to reduce confusion we will (and you should) use “improve execution time” or “improve performance”CS61C L29 Performance & Parallel (10)Beamer, Summer 2007 © UCBWhat is Time?•Straightforward definition of time: •Total time to complete a task, including disk accesses, memory accesses, I/O activities, operating system overhead, ...•“real time”, “response time” or “elapsed time” •Alternative: just time processor (CPU) is working only on your program (since multiple processes running at same time)•“CPU execution time” or “CPU time”•Often divided into system CPU time (in OS) and user CPU time (in user program)CS61C L29 Performance & Parallel (11)Beamer, Summer 2007 © UCBHow to Measure Time?•Real Time Actual time elapsed•CPU Time: Computers constructed using a clock that runs at a constant rate and determines when events take place in the hardware•These discrete time intervals called clock cycles (or informally clocks or cycles)•Length of clock period: clock cycle time (e.g., 2 nanoseconds or 2 ns) and clock rate (e.g., 500 megahertz, or 500 MHz), which is the inverse of the clock period; use these!CS61C L29 Performance & Parallel (12)Beamer, Summer 2007 © UCBMeasuring Time using Clock Cycles (1/2)•or= Clock Cycles for a program Clock Rate•CPU execution time for a program = Clock Cycles for a program x Clock PeriodCS61C L29 Performance & Parallel (13)Beamer, Summer 2007 © UCBMeasuring Time
View Full Document