Unformatted text preview:

1CS365 1PerformanceCS 365 Lecture 10Prof. Yih Huang Response Time (latency)– How long does it take for my job to run? Throughput– How many jobs can a system supports at once?– What is the average execution rate? If we upgrade a machine with a new processor what do we increase? If we add a new machine to the lab what do we increase?Computer Performance: TIME, TIME, TIME2CS 365 3 Elapsed Time– counts everything (disk and memory accesses, I/O , etc.) CPU time– doesn't count I/O or time spent running other programs– can be broken up into system time, and user timeExecution TimeCS365 4Program Performancecyclesecondsninstructiocyclesprogramnsinstructio programseconds cpu time ××==Instruction Count: the # of instructions to execute theprogramCPI: average #of clock cyclesper instructionClock period:length of one clock cyclein second3CS365 5Discussion When a processor vendor advertises GHz, what is the factor of focus ? Instruction count depends particular program and input data. CPI is also program dependent.– Different programs have different instruction mixes.CS365 6CPICountn Instructio where1jjjnjjIFFCPICPI =×=∑=# of cyclesto executeinstruction j,determined byCPU architectureFrequency ofinstruction jin the program,determined bythe givenprogram4CS365 7CPI Example, using MIPS Lite 50% of the instructions of Program A is type R, 30% branches, 10% load, 10% store. Compute the CPI of Program A. 30% of the instructions of Program B is type R, 10% branches, 40% load, 20% store. Compute the CPI of Program B. Suppose we have two implementations of the same instruction set architecture (ISA).  For some program with IC = x,– Machine A has a clock cycle time of 10 ns. and a CPI of 2.0 – Machine B has a clock cycle time of 20 ns. and a CPI of 1.2  What machine is faster for this program, and by how much?CPI Example5 A compiler is trying to decide between two code sequences. There are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles respectively.  The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C.Instruction Count ExampleCS365 10 CPI for sequence I: CPU cycles for sequence I: CPI for sequence II: CPU cycles for sequence II:6CS365 11MIPS MIPS: Millions of instructions per second Another performance indicator used by many processor vendors. Intuitively, the more instructions a processor can execute per second, the better its performance. Or is it? Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles respectively.  For a given program,– The first compiler uses 5 million Class A instructions, 1 million Class B, and 1 million Class C instructions.– The second uses 10 million Class A, 1 million Class B, and 1 million Class C.MIPS example7CS365 13 Which sequence will be faster according to MIPS? Which sequence will be faster according to execution time? Performance best determined by running a real application– Use programs typical of expected workload– Or, typical of expected class of applicationse.g., compilers/editors, scientific applications, graphics, etc. Small benchmarks– nice for architects and designers– easy to standardize– can be abusedBenchmarks8CS365 15SPEC SPEC (System Performance Evaluation Cooperative)– companies have agreed on a set of real program and inputs– can still be abused – valuable indicator of performance (and compiler technology)SPEC ‘95Benchmark Descriptiongo Artificial intelligence; plays the game of Gom88ksim Motorola 88k chip simulator; runs test programgcc The Gnu C compiler generating SPARC codecompress Compresses and decompresses file in memoryli Lisp interpreterijpeg Graphic compression and decompressionperl Manipulates strings and prime numbers in the special-purpose programming language Perlvortex A database programtomcatv A mesh generation programswim Shallow water model with 513 x 513 gridsu2cor quantum physics; Monte Carlo simulationhydro2d Astrophysics; Hydrodynamic Naiver Stokes equationsmgrid Multigrid solver in 3-D potential fieldapplu Parabolic/elliptic partial differential equationstrub3d Simulates isotropic, homogeneous turbulence in a cubeapsi Solves problems regarding temperature, wind velocity, and distribution of pollutantfpppp Quantum chemistrywave5 Plasma physics; electromagnetic particle simulation9CS365 17SPECint Does doubling the clock rate double the performance?C lo c k ra te (M H z )SPECint20468315791 02 0 0 2 5 01 5 01 0 05 0P e n tiu mP e n tiu m P r oCS365 18SPECfpP e n tiu mC lo ck ra te (M H z )SPECfpP e n tiu m P ro20468315791 020 0 2 5015 01 0 05 010CS365 19Amdahl's Law Execution Time After Improvement = Execution Time Unaffected +(Execution Time Affected / Amount of Improvement ) Example: Machines A and B use the same implementation design. The only difference between is clock rate.CS365 20 Running on Machine A, program X spends 20% of its execution times accessing memory and 80% doing computations. Machine B’s clock rate is twice of that of A.– How much faster X will be on B? How about 4 times faster? 8 times faster? 16 times faster?11CS365 21Discussion There is no absolute way to say one processor is faster than another. Performance numbers are always specific to particular programs. For a give architecture, performance can be increased by– Increases in clock rate– New datapaths that lower CPI– Smarter compiler technologiesCS365 22Increasing Clock Rate Achieved by the (constantly advancing) VLSI technologies Also can be made possible by reorganizing the datapath so that each cycle performs less tasks.– Recall the single and multi cycle datapathsof MIPS Lite.  May have adverse effects on CPI12CS365 23Reorganizing the Datapath The aim is to – Decrease CPI– Facilitate faster clock rate To lower CPI, we add more hardware so that more parallel tasks can be done in one cycle. To reduce clock rate, we want less sequentialtasks per cycle.CS365 24VLSI Factors Give the same VLSI technology, there are limits in – how much gates one can have in a chip– How fast the clock can be Moreover, the two factors are intertwined.– Less


View Full Document

MASON CS 365 - Performance

Download Performance
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Performance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Performance 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?