DOC PREVIEW
CMU CS 15740 - Performance & Technology

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Page 1Performance & TechnologyTodd C. MowryCS 740Sept 11, 2007Topics:• Performance measures• Relating performance measures• Memory Technology–SRAM, DRAM• Disk Technology• Recent Processor TrendsCS 740 F’072Performance expressed as a timeAbsolute time measures• difference between start and finish of an operation• synonyms: running time, elapsed time, response time, latency, completion time, execution time• most straightforward performance measureRelative (normalized) time measures• running time normalized to some reference time • (e.g. time/reference time)Guiding principle: Choose performance measures that track running time.CS 740 F’073Performance expressed as a rateRates are performance measures expressed in units of work per unit time.Examples:• millions of instructions / sec (MIPS)• millions of floating point instructions / sec (MFLOPS)• millions of bytes / sec (MBytes/sec)• millions of bits / sec (Mbits/sec)• images / sec• samples / sec• transactions / sec (TPS)CS 740 F’074Performance expressed as a rate(cont)Key idea: Report rates that track execution time.Example: Suppose we are measuring a program that convolves a stream of images from a video camera.Bad performance measure: MFLOPS• number of floating point operations depends on the particular convolution algorithm: n^2 matix-vector product vs nlogn fast Fourier transform. An FFT with a bad MFLOPS rate may run faster than a matrix-vector product with a good MFLOPS rate.Good performance measure: images/sec• a program that runs faster will convolve more images per second.Page 2CS 740 F’075Performance expressed as a rate(cont)Fallacy: Peak rates track running time.Example: the i860 is advertised as having a peak rate of 80 MFLOPS (40 MHz with 2 flops per cycle). However, the measured performance of some compiled linear algebra kernels (icc -O2) tells a different story:Kernel 1d fft sasum saxpy sdot sgemm sgemv spvmaMFLOPS 8.5 3.2 6.1 10.3 6.2 15.0 8.1%peak 11% 4% 7% 13% 8% 19% 10%CS 740 F’076Relating time to system measuresSuppose that for some program we have: • T seconds running time (the ultimate performance measure)• C clock ticks, I instructions, P seconds/tick (performance measures of interest to the system designer)T secs = C ticks x P secs/tick= (I inst/I inst) x C ticks x P secs/tickT secs = I inst x (C ticks/I inst) x P secs/tickrunningtimeinstructioncountavg clockticks perinstruction(CPI)clock periodCS 740 F’077Pipeline latency and throughputvideo processing system(N input images)In,...,I3, I2, I1(N output images)On,...,O3, O2, O1Latency (L): time to process an individual image.Throughput (R): images processed per unit timeOne image can be processed by the system at any point in timeCS 740 F’078Video system performanceL = 3 secs/image.R = 1/L = 1/3 images/sec.T = L + (N-1)1/R= 3Ntime1Stage 123456711122231 out2 outPage 3CS 740 F’079Pipelining the video systemstage 1(buffer)video pipeline(L1,R1)(L3,R3)(L2,R2)stage 3(display)stage 2(CPU)(N input images)In,...,I3, I2, I1(N output images)On,...,O3, O2, O1One image can be in each stage at any point in time. Li= latency of stage iRi= throughput of stage iL = L1+ L2+ L3R = min(R1, R2, R3)CS 740 F’0710Pipelined video system performancetime1Suppose:L1= L2= L3= 1Then:L = 3 secs/image.R = 1 image/sec.T = L + (N-1)1/R= N + 2Stage 1 Stage 2 Stage 32345671234567123456123451 out2 out3 out4 outCS 740 F’0711Relating time to latency & throughputIn general:• T = L + (N-1)/RThe impact of latency and throughput on running time depends on N:• (N = 1) => (T = L)• (N >> 1) => (T = N/R)To maximize throughput, we should try to maximize the minimum throughput over all stages (i.e., we strive for all stages to have equal throughput).CS 740 F’0712Amdahl’s lawYou plan to visit a friend in Normandy France and must decide whether it is worth it to take the Concorde SST ($3,100) or a 747 ($1,021) from NY to Paris, assuming it will take 4 hours Pgh to NY and 4 hours Paris to Normandy.time NY->Paris total trip time speedup over 747747 8.5 hours 16.5 hours 1SST 3.75 hours 11.75 hours 1.4Taking the SST (which is 2.2 times faster) speeds up the overall trip by only a factor of 1.4!Page 4CS 740 F’0713Amdahl’s law (cont)T1T2Old program (unenhanced)T1= time that can NOTbe enhanced.T2= time that can beenhanced.T2’ = time after theenhancement. Old time: T = T1+ T2T1’= T1T2’ <= T2New program (enhanced)New time: T’ = T1’+ T2’Speedup: Soverall= T / T’CS 740 F’0714Amdahl’s law (cont)Two key parameters: Fenhanced= T2/ T (fraction of original time that can be improved)Senhanced= T2/ T2’ (speedup of enhanced part)T’ = T1’+ T2’= T1+ T2’ = T(1-Fenhanced) + T2’= T(1-Fenhanced) + (T2/Senhanced) [by def of Senhanced]= T(1-Fenhanced) + T(Fenhanced/Senhanced) [by def of Fenhanced]= T((1-Fenhanced) + Fenhanced/Senhanced)Amdahl’s Law:Soverall= T / T’ = 1/((1-Fenhanced) + Fenhanced/Senhanced)Key idea: Amdahl’s law quantifies the general notion of diminishing returns. It applies to any activity, not just computer programs.CS 740 F’0715Amdahl’s law (cont)Trip example: Suppose that for the New York to Paris leg, we now consider the possibility of taking a rocket ship (15 minutes) or a handy rip in the fabric of space-time (0 minutes):time NY->Paris total trip time speedup over 747747 8.5 hours 16.5 hours 1SST 3.75 hours 11.75 hours 1.4rocket 0.25 hours 8.25 hours 2.0rip 0.0 hours 8 hours 2.1CS 740 F’0716Amdahl’s law (cont)Useful corollary to Amdahl’s law:• 1 <= Soverall<= 1 / (1 - Fenhanced)FenhancedMax SoverallFenhancedMax Soverall0.0 1 0.9375 160.5 2 0.96875 320.75 4 0.984375 640.875 8 0.9921875 128Moral: It is hard to speed up a program.Moral++ : It is easy to make premature optimizations.Page 5CS 740 F’0717Computer SystemDiskDiskMemory-I/O busMemory-I/O busProcessorProcessorCacheCacheMemoryMemoryI/OcontrollerI/OcontrollerI/OcontrollerI/OcontrollerI/OcontrollerI/OcontrollerDisplayDisplayNetworkNetworkRegCS 740 F’0718Levels in Memory HierarchyCPUCPUregsregsCacheMemoryMemorydiskdisksize:speed:$/Mbyte:block size:200 B3 ns8 BRegister Cache Memory Disk Memory32 KB / 4MB6 ns$100/MB32 B128 MB60 ns$1.50/MB8 KB20 GB8 ms$0.05/MBlarger, slower, cheaper8 B 32 B 8 KBcache virtual memoryCS 740 F’0719Scaling to 0.1µm• Semiconductor Industry Association, 1992 Technology Workshop– Projected future technology based on past trends19921995 1998 2001 2004 2007Feature


View Full Document

CMU CS 15740 - Performance & Technology

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Performance & Technology
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Performance & Technology and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Performance & Technology 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?