perfPerformance MetricsWhy study performance metrics?• determine the benefit/lack of benefit of designs• computer design is too complex to intuit performance &performance bottlenecks• have to be careful about what you mean to measure & howyou measure itWhat you should get out of this discussion• good metrics for measuring computer performance• what they should be used for• what metrics you shouldn’t use & how metrics are misusedperfPerformance of Computer SystemsMany different factors to take into account when determiningperformance:• Technology• circuit speed (clock, MHz)• processor technology (how many transistors on a chip)• Organization• type of processor (ILP)• configuration of the memory hierarchy• type of I/O devices• number of processors in the system• Software• quality of the compilers• organization & quality of OS, databases, etc.perf“Principles” of ExperimentationMeaningful metricsexecution time & component metrics that explain itReproducibilitymachine configuration, compiler & optimization level, OS, inputReal programsno toys, kernels, synthetic programsSPEC is the norm (integer, floating point, graphics, webserver)TPC-B, TPC-C & TPC-D for database transactionsSimulationlong executions, warm start to mimic steady-state behaviorusually applications only; some OS simulationsimulator “validation” & internal checks for accuracyperfMetrics that Measure PerformanceRaw speed: peak performance (never attained)Execution time: time to execute one program from beginning toend• the “performance bottom line”• wall clock time, response time• Unix time function: 13.7u 23.6s 18:27 3%Throughput: total amount of work completed in a given time• transactions (database) or packets (web servers) / second• an indication of how well hardware resources are being used• good metrics for chip designers or managers of computersystems(Often improving execution time will improve throughput & viceversa.)Component metrics: subsystem performance, e.g., memorybehavior• help explain how execution time was obtained• pinpoints performance bottlenecksperfExecution TimePerformancea= 1 / (Execution Timea)Processor A is faster than processor B, i.e.,Execution TimeA< Execution TimeBPerformanceA> PerformanceBRelative PerformancePerformanceA/ PerformanceB=n= ExecutionTImeB/ ExecutionTimeAperformance of A isntimes greater than Bexecution time of B isntimes longer than AperfCPU Execution TimeThe time the CPU spends executing an application• no memory effects• no I/O• no effects of multiprogrammingCPUExecutionTime = CPUClockCycles * ClockCycleTimeCycle time (clock period) is measured in time or rate• clock cycle time = 1/clock cycle rateCPUExecutionTime = CPUClockCycles / ClockCycleRate• clock cycle rate of 1 MHz = cycle time of 1µs• clock cycle rate of 1 GHz = cycle time of 1 nsperfCPICPUClockCycles = NumberOfInstructions * CPIAverage number of clock cycles per instruction• throughput metric• component metric, not a measure of performance• used for processor organization studies, given a fixed compiler&ISACan have different CPIs for classes of instructionse.g., floating point instructions take longer than integerinstructions∑×=niiCCPIclesCPUClockCy1)(where CPIi= CPI for a particular class of instructionswhere Ci= the number of instructions of the ithclass that havebeen executedImproving part of the architecture can improve a CPIi• Talk about the contribution to CPI of a class of instructionsperfCPU Execution TimeCPUExecutionTime =numberofInstructions * CPI * clockCycleTimeTo measure:• execution time: depends on all 3 factors• time the program• number of instructions: determined by the ISA• programmable hardware counters• profiling• count number of times each basic block is executed• instruction sampling• CPI: determined by the ISA & implementation• simulator: interpret (in software) every instruction &calculate the number of cycles it takes to simulate it• clock cycle time: determined by the implementation & processtechnologyFactors are interdependent:• RISC: increases instructions/program, but decreases CPI &clock cycle time because the instructions are simple• CISC: decreases instructions/program, but increases CPI &clock cycle time because many instructions are more complexperfMetrics Not to UseMIPS (millions of instructions per second)instruction count / execution time*10^6 =clock rate / (CPI * 10^6)- instruction set-dependent (even true for similar architectures)- implementation-dependent- compiler technology-dependent- program-dependent+ intuitive: the higher, the betterMFLOPS (millions of floating point operations per second)floating point operations / (execution time * 10^6)+ FP operations are independent of FP instructionimplementation- different machines implement different FP operations- different FP operations take different amounts of time- only measures FP codestatic metrics (code size)perfMeansMeasuring the performance of a workload• arithmetic: used for averaging execution timesntimenii11×∑=• harmonic: used for averaging rates ("the average of", asopposed to "the average statistic of")∑=piiratep11• weighted means: the programs are executed with differentfrequencies, for example:nweighttimeinii11××∑=perfMeansFP Ops Time (secs)Computer A Computer B Computer Cprogram1100 11020program 2 100 1000 100 20total 1001 110 40arith mean 500.5 55 20FP Ops Rate (FLOPS)Computer A Computer B Computer Cprogram 1 100 100 10 5program 2 100 .1 1 5harm mean .2 1.5 5arith mean 50.1 5.5 5Computer C is ~25 times faster than A when measuring executiontimeStill true when measuring MFLOPS(a rate) with the harmonic meanperfSpeedupSpeedup = Execution TimebeforeImprovement/ExecutionTimeafterImprovementAmdahl’s Law:Performance improvement from speeding up a part of acomputer system is limited by the proportion of time theenhancement is
View Full Document