DOC PREVIEW
UH COSC 6385 - Performance Measurement

This preview shows page 1-2-20-21 out of 21 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Edgar GabrielCOSC 6385Computer ArchitecturePerformance MeasurementEdgar GabrielFall 2009COSC 6385 – Computer ArchitectureEdgar GabrielMeasuring performance (I)• Response time: how long does it take to execute a certain application/a certain amount of work• Given two platforms X and Y, X is n times faster than Yfor a certain application if• Performance of X is n times faster than performance of Y ifXYTimeTimen =YXXYXYPerfPerfPerfPerfTimeTimen ===11(1)(2)2COSC 6385 – Computer ArchitectureEdgar GabrielMeasuring performance (II)• Timing how long an application takes– Wall clock time/elapsed time: time to complete a task as seen by the user. Might include operating system overhead or potentially interfering other applications. – CPU time: does not include time slices introduced by external sources (e.g. running other applications). CPU time can be further divided into• User CPU time: CPU time spent in the program• System CPU time: CPU time spent in the OS performing tasks requested by the program.COSC 6385 – Computer ArchitectureEdgar GabrielMeasuring performance• E.g. using the UNIX time commandElapsed timeUser CPU timeSystem CPU time3COSC 6385 – Computer ArchitectureEdgar GabrielAmdahl’s Law• Describes the performance gains by enhancing one part of the overall system (code, computer)• Amdahl’s Law depends on two factors:– Fraction of the execution time affected by enhancement – The improvement gained by the enhancement for this fractionorgenhenhorgPerfPerfTimeTimeSpeedup ==))1((enhenhenhorgenhSpeedupFractionFractionTimeTime +−=enhenhenhenhorgoverallSpeedupFractionFractionTimeTimeSpeedup+−==)1(1(3)(4)(5)COSC 6385 – Computer ArchitectureEdgar GabrielAmdahl’s Law (III)01234560 20 40 60 80 100Speedup overallSpeedup enhancedFraction enhanced: 20%Fraction enhanced: 40%Fraction enhanced: 60%Fraction enhanced: 80%enhenhenhoverallSpeedupFractionFractionSpeedup+−=)1(14COSC 6385 – Computer ArchitectureEdgar GabrielAmdahl’s Law (IV)0246810120 0.2 0.4 0.6 0.8 1Speedup overallFraction enhancedSpeedup according to Amdahl's LawSpeedup enhanced: 2Speedup enhanced: 4Speedup enhanced: 10COSC 6385 – Computer ArchitectureEdgar GabrielAmdahl’s Law - example• Assume a new web-server with a CPU being 10 times faster on computation than the previous web-server. I/O performance is not improved compared to the old machine. The web-server spends 40% of its time in computation and 60% in I/O. How much faster is the new machine overall?using formula (5)4.0=enhFraction10=enhSpeedup56.164.01104.0)4.01(1)1(1==+−=+−=enhenhenhoverallSpeedupFractionFractionSpeedup5COSC 6385 – Computer ArchitectureEdgar GabrielAmdahl’s Law – example (II)• Example: Consider a graphics card– 50% of its total execution time is spent in floating point operations – 20% of its total execution time is spent in floating point square root operations (FPSQR). Option 1: improve the FPSQR operation by a factor of 10. Option 2: improve all floating point operations by a factor of 1.622.182.01)102.0()2.01(1==+−=FPSQRSpeedup23.18125.01)6.15.0()5.01(1==+−=FPSpeedupOption 2 slightly fasterCOSC 6385 – Computer ArchitectureEdgar GabrielCPU Performance Equation• Micro-processors are based on a clock running at a constant rate• Clock cycle time: CCt– length of the discrete time event in ns• Equivalent measure: Rate– Expressed in MHz, GHz• CPU time of a program can then be expressed asor(6)(7)timerCCCPU1=timecyclestimeCCnoCPU∗=rcyclestimeCPUnoCPU =6COSC 6385 – Computer ArchitectureEdgar GabrielCPU Performance equation (II)• CPI: Average number of clock cycles per instruction• IC: number of instructions• Since the CPI is often known, the CPU time is• Expanding formula (6) leads to(8)(9)(10)ICnoCPIcycles=timetimeCCCPIICCPU∗∗=cyclescyclestimenotimeninstructionoprogramnsinstructioCPU ∗∗=COSC 6385 – Computer ArchitectureEdgar GabrielCPU performance equation (III)• According to (7) CPU performance is depending on– Clock cycle time → Hardware technology– CPI → Organization and instruction set architecture– Instruction count→ ISA and compiler technology• Note: on the last slide we used the average CPI over all instructions occurring in an application• Different instructions can have strongly varying CPI’s →→∑=×=niiicyclesCPIICno1timeniiitimeCCCPIICCPU ××=∑=1(11)(12)7COSC 6385 – Computer ArchitectureEdgar GabrielCPU performance equation (IV)• The average CPI for an application can then be calculated as: Fraction of occurrence of that instruction in a program• Note: CPIishould be measured for every single application separately since it might vary due to pipelining, cache effects etc.initotalitotalniiiCPIICICICCPIICCPI ×=×=∑∑==11totaliICIC(13)COSC 6385 – Computer ArchitectureEdgar GabrielExample (I)• (Page 43 in the 4thEdition) Consider a graphics card, with – FP operations (including FPSQR): frequency 25%, average CPI 4.0 – FPSQR operations only: frequency 2%, average CPI 20– all other instructions: average CPI 1.3333333• Design option 1: decrease CPI of FPSQR to 2• Design option 2: decrease CPI of all FP operations to 2.5Using formula (13):64.1)220(02.00.21=−−=−=enhCPICPIorg0.2)75.0*333333.1()25.0*4(1=+=×=∑=initotaliorgCPIICICCPI625.1)75.0*333333.1()25.0*5.2(12=+=×=∑=initotaliCPIICICCPI8COSC 6385 – Computer ArchitectureEdgar GabrielExample (II)• Slightly modified compared to the previous section: consider a graphics card, with – FP operations (excluding FPSQR): frequency 25%, average CPI 4.0 – FPSQR operations: frequency 2%, average CPI 20– all other instructions: average CPI 1.33• Design option 1: decrease CPI of FPSQR to 2• Design option 2: decrease CPI of all FP operations to 2.5Using formula (13):0109.2)73.0*33.1()02.0*2()25.0*4(11=++=×=∑=initotaliCPIICICCPI3709.2)73.0*33.1()02.0*20()25.0*4(1=++=×=∑=initotaliorgCPIICICCPI9959.1)73.0*33.1()02.0*20()25.0*5.2(12=++=×=∑=initotaliCPIICICCPICOSC 6385 – Computer ArchitectureEdgar GabrielDependability• Module reliability measures– MTTF: mean time to failure– FIT: failures in time • Often expressed as failures in 1,000,000,000 hours– MTTR: mean time to repair– MTBF: mean time between failures• Module availability:MTTRMTTFMTTFMA+=MTTRMTTFMTBF+=(14)(15)MTTFFIT1=(16)9COSC 6385 – Computer ArchitectureEdgar GabrielDependability - example• Assume a disk subsystem with the


View Full Document

UH COSC 6385 - Performance Measurement

Download Performance Measurement
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Performance Measurement and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Performance Measurement 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?