DOC PREVIEW
U of I CS 232 - Section 6: Performance

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS232 Section 6: PerformanceIn this section, we will discuss various aspects of performance. Anyone buying a computer willfirst ask: Which machine has the best performance? Which comparable machine has the lowest price?Which one has the best price/performance ratio? But there is much more to measuring perfomance thancomparing MHz specs.This section includes the following topics:1. Latency vs. throughput2. Amdahl’s Law3. Performance metrics of single-cycle and pipelined datapath1 Two notions of per formanceConsider two planes: Boeing 747 and Concorde. Their characteristics are summarized in the table below.Which plane has better performance?Plane DC to Paris Speed Passengers Throughput1747 6.5 hours 610 mph 470 286,700Concorde 3 hours 1350 mph 132 178,2001. measured in pmph, passenger-miles per hour.The perceived pe rformance depends on what metric is selected. We will consider two metrics, timeand number of tasks completed in a unit of time.How long does a task take? The length of a task is known as execution time, response time, orlatency.How many tasks are completed i n a unit of time? The number of tasks per unit of time iscalled throughput or bandwidth.We will refer to them as latency and throughput respectively.Consider a single-cycle machine with 2ns cycle time. Its latency is 2ns, because each instructioncompletes in 2ns. Its throughput is 1/2ns, because one instruction completes every 2ns. But therelationship between latency and throughput isn’t as clear-cut when a machine can perform multipletasks concurrently.Which performance metric is more important can also depend on a perspective. For example, in aclient-server system, a client is concerned with the time required to accomplish a task (i.e. latency), butthe performance of the server is measured by the number of requests it can serve in a unit of time (i.e.throughput).Throughput is measured in units of things per unit of time, e.g. hamburgers per hour, passenger-miles per hour. Bigger throughput means better performance.Latency is measured in units of time (seconds, ns). Performance is inversely proportional to thelatency (or execution time) of a task.P erformance(x) =1execution time(x)(1)So, smaller latency means better performance.How to compare performance of two machines? To compare performance of two machines Xand Y, run the same program (i.e. benchmark) on both of them. This means that you can assume thatthey have the same throughput and only the latency (execution time) needs to be compared. In suchcases a simple formula suffices:N =P erformance(X)P erformance(Y )=execution time(Y )execution time(X)(2)With this terminology we can easily compare the performance of two planes. Concorde has betterlatency and Boeing has better throughput.1CS232 Section 6: Performance2 Amdahl’s LawMeasuring performance is only a means, not an end. One use of performance measurements is determiningwhat part of the system to optimize to maximize performance increase. Amdahl’s Law measures howeffective an optimization can be.Amdahl’s Law states that optimizations are limited in their effectiveness. The performance enhance-ment possible with a given improvement is limited by the amount that the improved feature is used.Execution time af ter improvement =T ime af f ected by improvementAmount of improvement+ T ime unaf f ected by improvement(3)For example, doubling the speed of floating point operations sounds like a good idea. But if only 10%of the program execution time t involves floating point code, then the overall performance improvementis only 5%.Execution time after improvement =0.10t2+ 0.90t = 0.95t (4)3 Practice problems1. Recall the execution time equation:CP U timeX,P= Instructions executedP∗ CP IX,P∗ Clock cycle timeX(5)Consider a basic machine with the following characteristics:OP Type Freq (fi) Cycles CP IiLoad 20% 5Store 10% 3Branch 20% 2ALU 50% 3Calculate CPI for each instruction type.How much faster would the machine be if:(a) we added a cache to reduce average load time to 3 cycles?(b) we added a branch predictor to reduce branch time by 1 cycle?(c) we could do two ALU operations in parallel?2. Use the basic machine table from question 1, but assume that the frequency column indicates thepercentage of execution time spent in the c orresponding instruction type. Use Amdahl’s Law toanswer the following: if you could decrease the cycle time of one of the instruction types by 1 cycle,which instruction type would you optimize? How would the new execution time compare to theoriginal one?2CS232 Section 6: Performance3. Intel’s Itanium (IA-64) ISA is designed to facilitate executing multiple instructions per cycle. Ifan Itanium processor achieves an average CPI of .3 (3 instructions per cycle), how much faster isit than a Pentium4 (which uses the x86 ISA) with an average CPI of 1?(a) Itanium is three times faster(b) Itanium is one third as fast(c) Not enough information4. Assume the following delays for the main functional units of the single-cycle datapath shown below:Functional Unit Time DelayMemory 5nsALU 4nsRegister File 3nsGiven the following instructions: lw, sw and add, calculate:(a) Minimum time to perform each instruction(b) Time required on a single-cycle datapathWrite your answers in the table below. State any assumptions.Instruction instruction time instruction in Single-Cycle datapathlwswaddSingle-cycle datapath3CS232 Section 6: Performance5. Consider the following set of instructions:add $sp, $sp, -4sub $v0, $a0, $a1lw $t0, 4($sp)or $s0, $s1, $s2lw $t1, 8($sp)Assuming the instructions are executed on a single-cycle machine with 10ns cycle time, compute:(a) cycle time (hint: this is not a trick question)(b) instruction latency(c) instruction throughput6. Assume that the code above is executed on a 5-stage pipelined machine discussed in lecture. Youmight first draw the pipeline diagram in the space below.If a pipeline stage takes 2ns, calculate:(a) cycle time(b) instruction latency(c) instruction


View Full Document

U of I CS 232 - Section 6: Performance

Documents in this Course
Goal

Goal

2 pages

Exam 1

Exam 1

5 pages

Exam 1

Exam 1

6 pages

Exam 2

Exam 2

6 pages

Exam 1

Exam 1

5 pages

Load more
Download Section 6: Performance
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Section 6: Performance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Section 6: Performance 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?