Unformatted text preview:

G22.2233 L03 Performance. 1 Banikazemi, NYU, 2007CS G22.2233 Computer Systems Design Spring 2007Lecture 03: Understanding PerformanceMohammad Banikazemi[Slides from Prof. Mary Jane Irwin, PSU Adapted fromComputer Organization and Design, Patterson & Hennessy, © 2005, UCB]G22.2233 L03 Performance. 2 Banikazemi, NYU, 2007In the News “Intel shows off Penryn chips” CNET News Jan. 26th 2007 New 45-nanometer generation (smaller Core 2 Duo with enhancements)  New extensions to the instruction setO SSE4: fourth generation of Streaming SIMD (single instruction, multiple data) Extensions for multimedia applications and technical computing  Planning for a more powerful chip when the 45-nanometer technology matures AMD, IBM, and others to followG22.2233 L03 Performance. 3 Banikazemi, NYU, 2007Performance Metrics Purchasing perspectiveO given a collection of machines, which has the- best performance ?- least cost ?- best cost/performance?Design perspectiveO faced with design options, which has the- best performance improvement ?- least cost ?- best cost/performance? Both requireO basis for comparisonO metric for evaluation Our goal is to understand what factors in the architecture contribute to overall system performance and the relative importance (and cost) of these factorsG22.2233 L03 Performance. 4 Banikazemi, NYU, 2007Defining (Speed) Performance Normally interested in reducingO Response time (aka execution time) – the time between the start and the completion of a task- Important to individual usersO Thus, to maximize performance, need to minimize execution timeO Throughput – the total amount of work done in a given time- Important to data center managersO Decreasing response time almost always improves throughputperformanceX= 1 / execution_timeXIf X is n times faster than Y, thenperformanceXexecution_timeY-------------------- = --------------------- = nperformanceYexecution_timeXG22.2233 L03 Performance. 5 Banikazemi, NYU, 2007Performance Factors Want to distinguish elapsed time and the time spent on our task CPU execution time (CPU time) – time the CPU spends working on a taskO Does not include time waiting for I/O or running other programsCPU execution time # CPU clock cyclesfor a program for a program= x clock cycle timeCPU execution time # CPU clock cycles for a programfor a program clock rate = ------------------------------------------- Can improve performance by reducing either the length of the clock cycle or the number of clock cycles required for a programorG22.2233 L03 Performance. 6 Banikazemi, NYU, 2007Review: Machine Clock Rate Clock rate (MHz, GHz) is inverse of clock cycle time (clock period)CC = 1 / CRone clock period10 nsec clock cycle => 100 MHz clock rate5 nsec clock cycle => 200 MHz clock rate2 nsec clock cycle => 500 MHz clock rate1 nsec clock cycle => 1 GHz clock rate500 psec clock cycle => 2 GHz clock rate250 psec clock cycle => 4 GHz clock rate200 psec clock cycle => 5 GHz clock rateG22.2233 L03 Performance. 7 Banikazemi, NYU, 2007Clock Cycles per Instruction Not all instructions take the same amount of time to executeO One way to think about execution time is that it equals the number of instructions executed multiplied by the average time per instruction Clock cycles per instruction (CPI) – the average number of clock cycles each instruction takes to executeO A way to compare two different implementations of the same ISA# CPU clock cycles # Instructions Average clock cyclesfor a program for a program per instruction = x321CPICBACPI for this instruction classG22.2233 L03 Performance. 8 Banikazemi, NYU, 2007Effective CPI Computing the overall effective CPI is done by looking at the different types of instructions and their individual cycle counts and averagingOverall effective CPI = Σ (CPIix ICi)i = 1nO Where ICiis the count (percentage) of the number of instructions of class i executedO CPIiis the (average) number of clock cycles per instruction for that instruction classO n is the number of instruction classes The overall effective CPI varies by instruction mix – a measure of the dynamic frequency of instructions across one or many programsG22.2233 L03 Performance. 9 Banikazemi, NYU, 2007THE Performance Equation Our basic performance equation is thenCPU time = Instruction_count x CPI x clock_cycleInstruction_count x CPIclock_rateCPU time = -----------------------------------------------or These equations separate the three key factors that affect performanceO Can measure the CPU execution time by running the programO The clock rate is usually givenO Can measure overall instruction count by using profilers/ simulators without knowing all of the implementation detailsO CPI varies by instruction type and ISA implementation for which we must know the implementation detailsG22.2233 L03 Performance. 10 Banikazemi, NYU, 2007Determinates of CPU PerformanceCPU time = Instruction_count x CPI x clock_cycleAlgorithmTechnologyProcessor organizationISACompilerProgramming languageclock_cycleCPIInstruction_countG22.2233 L03 Performance. 11 Banikazemi, NYU, 2007Determinates of CPU PerformanceCPU time = Instruction_count x CPI x clock_cycleAlgorithmTechnologyProcessor organizationISACompilerProgramming languageclock_cycleCPIInstruction_countXXXXXXXXXXXXG22.2233 L03 Performance. 12 Banikazemi, NYU, 2007A Simple Example How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? How does this compare with using branch prediction to shave a cycle off the branch time? What if two ALU instructions could be executed at once?Σ =220%Branch310%Store520%Load150%ALUFreq x CPIiCPIiFreqOp.51.0.3.42.2CPU time new = 1.6 x IC x CC so 2.2/1.6 means 37.5% faster1.6.5.4.3.4.51.0.3.22.0CPU time new = 2.0 x IC x CC so 2.2/2.0 means 10% faster.251.0.3.41.95CPU time new = 1.95 x IC x CC so 2.2/1.95 means 12.8% fasterG22.2233 L03 Performance. 13 Banikazemi, NYU, 2007Comparing and Summarizing Performance Guiding principle in reporting performance measurements is reproducibility – list everything another experimenter would need to duplicate the experiment (version of the operating system, compiler settings, input set used, specific computer


View Full Document

NYU CSCI-GA 2233 - Understanding Performance

Download Understanding Performance
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Understanding Performance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Understanding Performance 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?