DOC PREVIEW
UW CSE 378 - Study Notes

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

perfPerformance MetricsWhy study performance metrics?• determine the benefit/lack of benefit of designs• computer design is too complex to intuit performance &performance bottlenecks• have to be careful about what you mean to measure & howyou measure itWhat you should get out of this discussion• good metrics for measuring computer performance• what they should be used for• what metrics you shouldn’t use & how metrics are misusedperfPerformance of Computer SystemsMany different factors to take into account when determiningperformance:• Technology• circuit speed (clock, MHz)• processor technology (how many transistors on a chip)• Organization• type of processor (ILP)• configuration of the memory hierarchy• type of I/O devices• number of processors in the system• Software• quality of the compilers• organization & quality of OS, databases, etc.perf“Principles” of ExperimentationMeaningful metricsexecution time & component metrics that explain itReproducibilitymachine configuration, compiler & optimization level, OS, inputReal programsno toys, kernels, synthetic programsSPEC is the norm (integer, floating point, graphics, webserver)TPC-B, TPC-C & TPC-D for database transactionsSimulationlong executions, warm start to mimic steady-state behaviorusually applications only; some OS simulationsimulator “validation” & internal checks for accuracyperfMetrics that Measure PerformanceRaw speed: peak performance (never attained)Execution time: time to execute one program from beginning toend• the “performance bottom line”• wall clock time, response time• Unix time function: 13.7u 23.6s 18:27 3%Throughput: total amount of work completed in a given time• transactions (database) or packets (web servers) / second• an indication of how well hardware resources are being used• good metrics for chip designers or managers of computersystems(Often improving execution time will improve throughput & viceversa.)Component metrics: subsystem performance, e.g., memorybehavior• help explain how execution time was obtained• pinpoints performance bottlenecksperfExecution TimePerformancea= 1 / (Execution Timea)Processor A is faster than processor B, i.e.,Execution TimeA< Execution TimeBPerformanceA> PerformanceBRelative PerformancePerformanceA/ PerformanceB=n= ExecutionTImeB/ ExecutionTimeAperformance of A isntimes greater than Bexecution time of B isntimes longer than AperfCPU Execution TimeThe time the CPU spends executing an application• no memory effects• no I/O• no effects of multiprogrammingCPUExecutionTime = CPUClockCycles * ClockCycleTimeCycle time (clock period) is measured in time or rate• clock cycle time = 1/clock cycle rateCPUExecutionTime = CPUClockCycles / ClockCycleRate• clock cycle rate of 1 MHz = cycle time of 1µs• clock cycle rate of 1 GHz = cycle time of 1 nsperfCPICPUClockCycles = NumberOfInstructions * CPIAverage number of clock cycles per instruction• throughput metric• component metric, not a measure of performance• used for processor organization studies, given a fixed compiler&ISACan have different CPIs for classes of instructionse.g., floating point instructions take longer than integerinstructions∑×=niiCCPIclesCPUClockCy1)(where CPIi= CPI for a particular class of instructionswhere Ci= the number of instructions of the ithclass that havebeen executedImproving part of the architecture can improve a CPIi• Talk about the contribution to CPI of a class of instructionsperfCPU Execution TimeCPUExecutionTime =numberofInstructions * CPI * clockCycleTimeTo measure:• execution time: depends on all 3 factors• time the program• number of instructions: determined by the ISA• programmable hardware counters• profiling• count number of times each basic block is executed• instruction sampling• CPI: determined by the ISA & implementation• simulator: interpret (in software) every instruction &calculate the number of cycles it takes to simulate it• clock cycle time: determined by the implementation & processtechnologyFactors are interdependent:• RISC: increases instructions/program, but decreases CPI &clock cycle time because the instructions are simple• CISC: decreases instructions/program, but increases CPI &clock cycle time because many instructions are more complexperfMetrics Not to UseMIPS (millions of instructions per second)instruction count / execution time*10^6 =clock rate / (CPI * 10^6)- instruction set-dependent (even true for similar architectures)- implementation-dependent- compiler technology-dependent- program-dependent+ intuitive: the higher, the betterMFLOPS (millions of floating point operations per second)floating point operations / (execution time * 10^6)+ FP operations are independent of FP instructionimplementation- different machines implement different FP operations- different FP operations take different amounts of time- only measures FP codestatic metrics (code size)perfMeansMeasuring the performance of a workload• arithmetic: used for averaging execution timesntimenii11×∑=• harmonic: used for averaging rates ("the average of", asopposed to "the average statistic of")∑=piiratep11• weighted means: the programs are executed with differentfrequencies, for example:nweighttimeinii11××∑=perfMeansFP Ops Time (secs)Computer A Computer B Computer Cprogram1100 11020program 2 100 1000 100 20total 1001 110 40arith mean 500.5 55 20FP Ops Rate (FLOPS)Computer A Computer B Computer Cprogram 1 100 100 10 5program 2 100 .1 1 5harm mean .2 1.5 5arith mean 50.1 5.5 5Computer C is ~25 times faster than A when measuring executiontimeStill true when measuring MFLOPS(a rate) with the harmonic meanperfSpeedupSpeedup = Execution TimebeforeImprovement/ExecutionTimeafterImprovementAmdahl’s Law:Performance improvement from speeding up a part of acomputer system is limited by the proportion of time theenhancement is


View Full Document

UW CSE 378 - Study Notes

Documents in this Course
Encoding

Encoding

20 pages

Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?