Performance Technology Performance expressed as a time Todd C Mowry CS 740 Absolute time measures difference between start and finish of an operation synonyms running time elapsed time response time latency completion time execution time most straightforward performance measure Sept 13 2000 Relative normalized time measures Topics running time normalized to some reference time e g time reference time Performance measures Relating performance measures Memory Technology SRAM DRAM Disk Technology Guiding principle Choose performance measures that track running time 2 Performance expressed as a rate CS 740 F 00 Performance expressed as a rate cont Rates are performance measures expressed in units of work per unit time Key idea Report rates that track execution time Examples Example Suppose we are measuring a program that convolves a stream of images from a video camera millions of instructions sec MIPS millions of floating point instructions sec MFLOPS millions of bytes sec MBytes sec millions of bits sec Mbits sec images sec samples sec transactions sec TPS Bad performance measure MFLOPS number of floating point operations depends on the particular convolution algorithm n 2 matix vector product vs nlogn fast Fourier transform An FFT with a bad MFLOPS rate may run faster than a matrix vector product with a good MFLOPS rate Good performance measure images sec a program that runs faster will convolve more images per second 3 CS 740 F 00 4 Page 1 CS 740 F 00 Performance expressed as a rate cont Relating time to system measures Suppose that for some program we have Fallacy Peak rates track running time T seconds running time the ultimate performance measure C clock ticks I instructions P seconds tick performance measures of interest to the system designer Example the i860 is advertised as having a peak rate of 80 MFLOPS 40 MHz with 2 flops per cycle T secs C ticks x P secs tick I inst I inst x C ticks x P secs tick T secs I inst x C ticks I inst x P secs tick However the measured performance of some compiled linear algebra kernels icc O2 tells a different story Kernel MFLOPS peak 1d fft sasum 8 5 3 2 11 4 saxpy 6 1 7 sdot 10 3 13 sgemm sgemv 6 2 15 0 8 19 5 running time spvma 8 1 10 instruction count CS 740 F 00 avg clock ticks per instruction CPI clock period 6 Pipeline latency and throughput CS 740 F 00 Video system performance Stage 1 In I3 I2 I1 video processing system N input images L 3 secs image On O3 O2 O1 time R 1 L 1 3 images sec N output images T L N 1 1 R 3N Latency L time to process an individual image Throughput R images processed per unit time 1 1 2 1 3 1 4 2 5 2 6 2 7 3 1 out 2 out One image can be processed by the system at any point in time 7 CS 740 F 00 8 Page 2 CS 740 F 00 Pipelining the video system Pipelined video system performance video pipeline In I3 I2 I1 stage 1 buffer N input images L1 R1 stage 2 CPU stage 3 display L2 R2 L3 R3 L1 L2 L3 1 N output images Then One image can be in each stage at any point in time L 3 secs image Li latency of stage i Ri throughput of stage i R 1 image sec T L N 1 1 R N 2 L L1 L2 L3 R min R1 R2 R3 9 Stage 1 Suppose On O3 O2 O1 CS 740 F 00 time Stage 2 1 2 2 1 3 3 2 1 4 4 3 2 1 out 5 5 4 3 2 out 6 6 5 4 3 out 7 7 6 5 4 out 10 Relating time to latency throughput Stage 3 1 CS 740 F 00 Amdahl s law In general You plan to visit a friend in Normandy France and must decide whether it is worth it to take the Concorde SST 3 100 or a 747 1 021 from NY to Paris assuming it will take 4 hours Pgh to NY and 4 hours Paris to Normandy T L N 1 R The impact of latency and throughput on running time depends on N N 1 T L N 1 T N R time NY Paris total trip time speedup over 747 747 SST To maximize throughput we should try to maximize the minimum throughput over all stages i e we strive for all stages to have equal throughput 11 8 5 hours 3 75 hours 16 5 hours 11 75 hours 1 1 4 Taking the SST which is 2 2 times faster speeds up the overall trip by only a factor of 1 4 CS 740 F 00 12 Page 3 CS 740 F 00 Amdahl s law cont Two key parameters Old program unenhanced T1 T1 time that can NOT be enhanced T2 Old time T T1 T2 Fenhanced T2 T fraction of original time that can be improved Senhanced T2 T2 speedup of enhanced part T2 time that can be enhanced New program enhanced T1 T1 Amdahl s law cont T T1 T2 T1 T2 T 1 Fenhanced T2 T 1 Fenhanced T2 Senhanced T 1 Fenhanced T Fenhanced Senhanced T 1 Fenhanced Fenhanced Senhanced T2 time after the enhancement T2 T2 New time T T1 T2 Amdahl s Law Soverall T T 1 1 Fenhanced Fenhanced Senhanced Speedup Soverall T T Key idea Amdahl s law quantifies the general notion of diminishing returns It applies to any activity not just computer programs 13 CS 740 F 00 14 Amdahl s law cont time NY Paris 8 5 hours 3 75 hours 0 25 hours 0 0 hours total trip time 16 5 hours 11 75 hours 8 25 hours 8 hours CS 740 F 00 Amdahl s law cont Useful corollary to Amdahl s law Trip example Suppose that for the New York to Paris leg we now consider the possibility of taking a rocket ship 15 minutes or a handy rip in the fabric of space time 0 minutes 747 SST rocket rip by def of Senhanced by def of Fenhanced 1 Soverall speedup over 747 1 1 4 2 0 2 1 1 1 Fenhanced Fenhanced Max Soverall Fenhanced Max Soverall 0 0 1 0 9375 16 0 5 2 0 96875 0 75 4 0 984375 64 0 875 8 0 9921875 128 32 Moral It is hard to speed up a program Moral It is easy to make premature optimizations 15 CS 740 F 00 16 Page 4 CS 740 F 00 Computer System Levels in Memory Hierarchy cache Processor Processor virtual memory Reg CPU CPU regs regs Cache Cache Memory I O Memory I Obus bus I O I O controller controller Memory Memory Disk Register I O I O controller controller I O I O controller controller Display Display Network Network Disk 17 size speed Mbyte block size 0 35 0 25 0 18 64M 256M 1G 6 0 8 0 Memory disk disk Disk Memory …
View Full Document
Unlocking...