Edgar GabrielCOSC 6385Computer ArchitecturePerformance MeasurementEdgar GabrielSpring 2011COSC 6385 – Computer ArchitectureEdgar GabrielMeasuring performance (I)• Response time: how long does it take to execute a certain application/a certain amount of work• Given two platforms X and Y, X is n times faster than Yfor a certain application if• Performance of X is n times higher than performance of Y ifXYTimeTimen =YXXYXYPerfPerfPerfPerfTimeTimen ===11(1)(2)COSC 6385 – Computer ArchitectureEdgar GabrielMeasuring performance (II)• Timing how long an application takes– Wall clock time/elapsed time: time to complete a task as seen by the user. Might include operating system overhead or potentially interfering other applications. – CPU time: does not include time slices introduced by external sources (e.g. running other applications). CPU time can be further divided into• User CPU time: CPU time spent in the program• System CPU time: CPU time spent in the OS performing tasks requested by the program.COSC 6385 – Computer ArchitectureEdgar GabrielMeasuring performance• E.g. using the UNIX time commandElapsed timeUser CPU timeSystem CPU timeCOSC 6385 – Computer ArchitectureEdgar GabrielAmdahl’s Law• Describes the performance gains by enhancing one part of the overall system (code, computer)• Amdahl’s Law depends on two factors:– Fraction of the execution time affected by enhancement – The improvement gained by the enhancement for this fractionorgenhenhorgPerfPerfTimeTimeSpeedup ==))1((enhenhenhorgenhSpeedupFractionFractionTimeTime +−=enhenhenhenhorgoverallSpeedupFractionFractionTimeTimeSpeedup+−==)1(1(3)(4)(5)COSC 6385 – Computer ArchitectureEdgar GabrielAmdahl’s Law (III)01234560 20 40 60 80 100Speedup overallSpeedup enhancedFraction enhanced: 20%Fraction enhanced: 40%Fraction enhanced: 60%Fraction enhanced: 80%enhenhenhoverallSpeedupFractionFractionSpeedup+−=)1(1COSC 6385 – Computer ArchitectureEdgar GabrielAmdahl’s Law (IV)0246810120 0.2 0.4 0.6 0.8 1Speedup overallFraction enhancedSpeedup according to Amdahl's LawSpeedup enhanced: 2Speedup enhanced: 4Speedup enhanced: 10COSC 6385 – Computer ArchitectureEdgar GabrielAmdahl’s Law - example• Assume a new web-server with a CPU being 10 times faster on computation than the previous web-server. I/O performance is not improved compared to the old machine. The web-server spends 40% of its time in computation and 60% in I/O. How much faster is the new machine overall?using formula (5)4.0=enhFraction10=enhSpeedup56.164.01104.0)4.01(1)1(1==+−=+−=enhenhenhoverallSpeedupFractionFractionSpeedupCOSC 6385 – Computer ArchitectureEdgar GabrielAmdahl’s Law – example (II)• Example: Consider a graphics card– 50% of its total execution time is spent in floating point operations – 20% of its total execution time is spent in floating point square root operations (FPSQR). Option 1: improve the FPSQR operation by a factor of 10. Option 2: improve all floating point operations by a factor of 1.622.182.01)102.0()2.01(1==+−=FPSQRSpeedup23.18125.01)6.15.0()5.01(1==+−=FPSpeedupOption 2 slightly fasterCOSC 6385 – Computer ArchitectureEdgar GabrielCPU Performance Equation• Micro-processors are based on a clock running at a constant rate• Clock cycle time: CCt– length of the discrete time event in ns• Equivalent measure: Rate– Expressed in MHz, GHz• CPU time of a program can then be expressed asor(6)(7)timerCCCPU1=timecyclestimeCCnoCPU∗=rcyclestimeCPUnoCPU =COSC 6385 – Computer ArchitectureEdgar GabrielCPU Performance equation (II)• CPI: Average number of clock cycles per instruction• IC: number of instructions• Since the CPI is often known (average), the CPU time is• Expanding formula (6) leads to(8)(9)(10)ICnoCPIcycles=timetimeCCCPIICCPU∗∗=cyclescyclestimenotimeninstructionoprogramnsinstructioCPU ∗∗=COSC 6385 – Computer ArchitectureEdgar GabrielCPU performance equation (III)• According to (7) CPU performance is depending on– Clock cycle time → Hardware technology– CPI → Organization and instruction set architecture– Instruction count→ ISA and compiler technology• Note: on the last slide we used the average CPI over all instructions occurring in an application• Different instructions can have strongly varying CPI’s →→∑=×=niiicyclesCPIICno1timeniiitimeCCCPIICCPU ××=∑=1(11)(12)COSC 6385 – Computer ArchitectureEdgar GabrielCPU performance equation (IV)• The average CPI for an application can then be calculated as: Fraction of occurrence of that instruction in a programinitotalitotalniiiCPIICICICCPIICCPI ×=×=∑∑==11totaliICIC(13)COSC 6385 – Computer ArchitectureEdgar GabrielExample (I)• (Page 43 in the 4thEdition) Consider a graphics card, with – FP operations (including FPSQR): frequency 25%, average CPI 4.0 – FPSQR operations only: frequency 2%, average CPI 20– all other instructions: average CPI 1.3333333• Design option 1: decrease CPI of FPSQR to 2• Design option 2: decrease CPI of all FP operations to 2.5Using formula (13):64.1)220(02.00.21=−−=−=enhCPICPIorg0.2)75.0*333333.1()25.0*4(1=+=×=∑=initotaliorgCPIICICCPI625.1)75.0*333333.1()25.0*5.2(12=+=×=∑=initotaliCPIICICCPICOSC 6385 – Computer ArchitectureEdgar GabrielExample (II)• Slightly modified compared to the previous section: consider a graphics card, with – FP operations (excluding FPSQR): frequency 25%, average CPI 4.0 – FPSQR operations: frequency 2%, average CPI 20– all other instructions: average CPI 1.33• Design option 1: decrease CPI of FPSQR to 2• Design option 2: decrease CPI of all FP operations to 2.5Using formula (13):0109.2)73.0*33.1()02.0*2()25.0*4(11=++=×=∑=initotaliCPIICICCPI3709.2)73.0*33.1()02.0*20()25.0*4(1=++=×=∑=initotaliorgCPIICICCPI9959.1)73.0*33.1()02.0*20()25.0*5.2(12=++=×=∑=initotaliCPIICICCPICOSC 6385 – Computer ArchitectureEdgar GabrielDependability• Module reliability measures– MTTF: mean time to failure– FIT: failures in time • Often expressed as failures in 1,000,000,000 hours– MTTR: mean time to repair– MTBF: mean time between failures• Module availability:MTTRMTTFMTTFMA+=MTTRMTTFMTBF+=(14)(15)MTTFFIT1=(16)COSC 6385 – Computer ArchitectureEdgar GabrielDependability - example• Assume a disk subsystem with the following components and MTTFs:– 10 disks, MTTF=1,000,000h– 1 SCSI controller, MTTF=500,000h– 1 power supply, MTTF=200,000h– 1
View Full Document