CS61C L41 Performance II (1)Garcia © UCBLecturer PSOE Dan Garciawww.cs.berkeley.edu/~ddgarciainst.eecs.berkeley.edu/~cs61cCS61C : Machine StructuresLecture 41 Performance IIUWB…Ultra Wide Band! ⇒www.nytimes.com/2005/05/04/technology/techspecial/04markoff.htmlThe FCC moved one stepcloser to approving a standard for thistechnology which uses spread spectrumpulses to send its information. Imagineno data wires to ANY of your devices!CS61C L41 Performance II (2)Garcia © UCBReview• RAID• Motivation: In the 1980s, there were 2classes of drives: expensive, big forenterprises and small for PCs. They thought“make one big out of many small!”• Higher perf with more disk arms per $• Raid 0 through 5 are solutions with tradeoffs• 32 B$ industry• Started @ Cal by CS Profs Katz & Patterson• Latency v. Throughput• Time for one job vs aggregate time for manyCS61C L41 Performance II (3)Garcia © UCBWhat is Time?• Straightforward definition of time:• Total time to complete a task, including diskaccesses, memory accesses, I/O activities,operating system overhead, ...• “real time”, “response time” or“elapsed time”• Alternative: just time processor (CPU)is working only on your program (sincemultiple processes running at same time)• “CPU execution time” or “CPU time”• Often divided into system CPU time (in OS)and user CPU time (in user program)CS61C L41 Performance II (4)Garcia © UCBHow to Measure Time?• User Time ⇒ seconds• CPU Time: Computers constructedusing a clock that runs at a constantrate and determines when events takeplace in the hardware• These discrete time intervals calledclock cycles (or informally clocks orcycles)• Length of clock period: clock cycle time(e.g., 2 nanoseconds or 2 ns) and clockrate (e.g., 500 megahertz, or 500 MHz),which is the inverse of the clock period;use these!CS61C L41 Performance II (5)Garcia © UCBMeasuring Time using Clock Cycles (1/2)• or= Clock Cycles for a programClock Rate• CPU execution time for a program = Clock Cycles for a program x Clock Cycle TimeCS61C L41 Performance II (6)Garcia © UCBMeasuring Time using Clock Cycles (2/2)• One way to define clock cycles:Clock Cycles for program = Instructions for a program(called “Instruction Count”) x Average Clock cycles Per Instruction (abbreviated “CPI”)• CPI one way to compare twomachines with same instruction set,since Instruction Count would be thesameCS61C L41 Performance II (7)Garcia © UCBPerformance Calculation (1/2)• CPU execution time for program= Clock Cycles for program x Clock Cycle Time• Substituting for clock cycles:CPU execution time for program= (Instruction Count x CPI) x Clock Cycle Time= Instruction Count x CPI x Clock Cycle TimeCS61C L41 Performance II (8)Garcia © UCBPerformance Calculation (2/2)CPU time = Instructions x Cycles x SecondsProgram Instruction CycleCPU time = Instructions x Cycles x SecondsProgram Instruction CycleCPU time = Instructions x Cycles x SecondsProgram Instruction CycleCPU time = SecondsProgram• Product of all 3 terms: if missing a term,can’t predict time, the real measure ofperformanceCS61C L41 Performance II (9)Garcia © UCBHow Calculate the 3 Components?• Clock Cycle Time: in specification ofcomputer (Clock Rate in advertisements)• Instruction Count:• Count instructions in loop of small program• Use simulator to count instructions• Hardware counter in spec. register- (Pentium II,III,4)• CPI:• Calculate: Execution Time / Clock cycle timeInstruction Count• Hardware counter in special register (PII,III,4)CS61C L41 Performance II (10)Garcia © UCBCalculating CPI Another Way• First calculate CPI for each individualinstruction (add, sub, and, etc.)• Next calculate frequency of eachindividual instruction• Finally multiply these two for eachinstruction and add them up to getfinal CPI (the weighted sum)CS61C L41 Performance II (11)Garcia © UCBExample (RISC processor)Op Freqi CPIi Prod (% Time)ALU 50% 1 .5 (23%)Load 20% 5 1.0 (45%)Store 10% 3 .3 (14%)Branch 20% 2 .4 (18%) 2.2• What if Branch instructions twice as fast?Instruction Mix (Where time spent)CS61C L41 Performance II (12)Garcia © UCBWhat Programs Measure for Comparison?• Ideally run typical programs withtypical input before purchase,or before even build machine• Called a “workload”; For example:• Engineer uses compiler, spreadsheet• Author uses word processor, drawingprogram, compression software• In some situations its hard to do• Don’t have access to machine to“benchmark” before purchase• Don’t know workload in future• Next: benchmarks &PC-Mac showdown!CS61C L41 Performance II (13)Garcia © UCBBenchmarks• Obviously, apparent speed ofprocessor depends on code used totest it• Need industry standards so thatdifferent processors can be fairlycompared• Companies exist that create thesebenchmarks: “typical” code used toevaluate systems• Need to be changed every 2 or 3years since designers could (and do!)target for these standard benchmarksCS61C L41 Performance II (14)Garcia © UCBExample Standardized Benchmarks (1/2)• Standard Performance EvaluationCorporation (SPEC) SPEC CPU2000• CINT2000 12 integer (gzip, gcc, crafty, perl, ...)• CFP2000 14 floating-point (swim, mesa, art, ...)• All relative to base machineSun 300MHz 256Mb-RAM Ultra5_10,which gets score of 100•www.spec.org/osg/cpu2000/• They measure- System speed (SPECint2000)- System throughput (SPECint_rate2000)CS61C L41 Performance II (15)Garcia © UCBExample Standardized Benchmarks (2/2)• SPEC• Benchmarks distributed in source code• Members of consortium select workload- 30+ companies, 40+ universities• Compiler, machine designers targetbenchmarks, so try to change every 3 years• The last benchmark released was SPEC 2000- They are still finalizing SPEC 2005CFP2000wupwise Fortran77 Physics / Quantum Chromodynamicsswim Fortran77 Shallow Water Modelingmgrid Fortran77 Multi-grid Solver: 3D Potential Fieldapplu Fortran77 Parabolic / Elliptic Partial Differential Equationsmesa C 3-D Graphics Librarygalgel Fortran90 Computational Fluid Dynamicsart C Image Recognition / Neural Networksequake C Seismic Wave Propagation Simulationfacerec Fortran90 Image Processing: Face Recognitionammp C Computational Chemistrylucas Fortran90 Number Theory / Primality
View Full Document