DOC PREVIEW
CMU CS 15213 - Cache Performance

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Page 1Cache PerformanceOctober 3, 2007Cache PerformanceOctober 3, 2007TopicsTopics Impact of caches on performance The memory mountainclass11.ppt15-213“The course that gives CMU its Zip!”–2–15-213, F’07Intel Pentium III Cache HierarchyIntel Pentium III Cache HierarchyProcessor ChipProcessor ChipL1 Data1 cycle latency16 KB4-way assocWrite-through32B linesL1 Instruction16 KB, 4-way32B linesRegs.L2 Unified128KB--2 MB4-way assocWrite-backWrite allocate32B linesL2 Unified128KB--2 MB4-way assocWrite-backWrite allocate32B linesMainMemoryUp to 4GBMainMemoryUp to 4GB–3–15-213, F’07Cache Performance MetricsCache Performance MetricsMiss RateMiss Rate Fraction of memory references not found in cache(misses / references) Typical numbers:z 3-10% for L1z can be quite small (e.g., < 1%) for L2, depending on size, etc.Hit TimeHit Time Time to deliver a line in the cache to the processor (includes time to determine whether the line is in the cache) Typical numbers:z 1-2 clock cycle for L1z 5-20 clock cycles for L2Miss PenaltyMiss Penalty Additional time required because of a missz Typically 50-200 cycles for main memory (Trend: increasing!)Aside for architects:-Increasing cache size?-Increasing block size?-Increasing associativity?–4–15-213, F’07Writing Cache Friendly CodeWriting Cache Friendly Code••Repeated references to variables are goodRepeated references to variables are good((temporal localitytemporal locality))••StrideStride--1 reference patterns are good (1 reference patterns are good (spatial localityspatial locality))••Examples:Examples:cold cache, 4-byte words, 4-word cache blocksint sum_array_rows(int a[M][N]){int i, j, sum = 0;for (i = 0; i < M; i++)for (j = 0; j < N; j++)sum += a[i][j];return sum;}int sum_array_cols(int a[M][N]){int i, j, sum = 0;for (j = 0; j < N; j++)for (i = 0; i < M; i++)sum += a[i][j];return sum;}Miss rate = Miss rate = 1/4 = 25%100%Page 2–5–15-213, F’07The Memory MountainThe Memory MountainRead throughput (read bandwidth)Read throughput (read bandwidth) Number of bytes read from memory per second (MB/s)Memory mountainMemory mountain Measured read throughput as a function of spatial and temporal locality. Compact way to characterize memory system performance. –6–15-213, F’07Memory Mountain Test FunctionMemory Mountain Test Function/* The test function */void test(int elems, int stride) {int i, result = 0; volatile int sink; for (i = 0; i < elems; i += stride)result += data[i];sink = result; /* So compiler doesn't optimize away the loop */}/* Run test(elems, stride) and return read throughput (MB/s) */double run(int size, int stride, double Mhz){double cycles;int elems = size / sizeof(int); test(elems, stride); /* warm up the cache */cycles = fcyc2(test, elems, stride, 0); /* call test(elems,stride) */return (size / stride) / (cycles / Mhz); /* convert cycles to MB/s */}–7–15-213, F’07Memory Mountain Main RoutineMemory Mountain Main Routine/* mountain.c - Generate the memory mountain. */#define MINBYTES (1 << 10) /* Working set size ranges from 1 KB */#define MAXBYTES (1 << 23) /* ... up to 8 MB */#define MAXSTRIDE 16 /* Strides range from 1 to 16 */#define MAXELEMS MAXBYTES/sizeof(int) int data[MAXELEMS]; /* The array we'll be traversing */int main(){int size; /* Working set size (in bytes) */int stride; /* Stride (in array elements) */double Mhz; /* Clock frequency */init_data(data, MAXELEMS); /* Initialize each element in data to 1 */Mhz = mhz(0); /* Estimate the clock frequency */for (size = MAXBYTES; size >= MINBYTES; size >>= 1) {for (stride = 1; stride <= MAXSTRIDE; stride++) printf("%.1f\t", run(size, stride, Mhz));printf("\n");}exit(0);}–8–15-213, F’07The Memory MountainThe Memory Mountains1s3s5s7s9s11s13s158m2m512k128k32k8k2k020040060080010001200L1L2memxeSlopes ofSpatialLocalityPentium III550 MHz16 KB on-chip L1 d-cache16 KB on-chip L1 i-cache512 KB off-chip unifiedL2 cacheRidges ofTemporalLocalityWorking set size(bytes)Stride (words)Throughput (MB/sec)Page 3–9–15-213, F’07X86-64 Memory MountainX86-64 Memory Mountains1s5s9s13s17s21s25s29128m4m128k4k0100020003000400050006000Read Throughput (MB/s)Stride (words)Working Set Size (bytes)Slopes ofSpatial LocalityRidges ofTemporal LocalityMemL2L1Pentium Nocona Xeon x86-643.2 GHz12 Kuop on-chip L1 trace cache16 KB on-chip L1 d-cache1 MB off-chip unified L2 cache –10–15-213, F’07Opteron Memory MountainOpteron Memory Mountains1s5s9s13s17s21s25s29128m4m128k4k050010001500200025003000Read throughput (MB/s)Stride (words)Working set (bytes)AMD Opteron2 GHZL1L2Mem–11–15-213, F’07Ridges of Temporal LocalityRidges of Temporal LocalitySlice through the memory mountain with stride=1Slice through the memory mountain with stride=1 illuminates read throughputs of different caches and memory0200400600800100012008m4m2m1024k512k256k128k64k32k16k8k4k2k1kworking set size (bytes)read througput (MB/s)L1 cacheregionL2 cacheregionmain memoryregion–12–15-213, F’07A Slope of Spatial LocalityA Slope of Spatial LocalitySlice through memory mountain with size=256KBSlice through memory mountain with size=256KB shows cache block size.0100200300400500600700800s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 s16stride (words)read throughput (MB/s)one access per cache linePage 4–13–15-213, F’07Matrix Multiplication ExampleMatrix Multiplication ExampleMajor Cache Effects to ConsiderMajor Cache Effects to Consider Total cache sizez Exploit temporal locality and keep the working set small (e.g., use blocking) Block sizez Exploit spatial localityDescription:Description: Multiply N x N matrices O(N3) total operations Accessesz N reads per source elementz N values summed per destination» but may be able to hold in register/* ijk */for (i=0; i<n; i++) {for (j=0; j<n; j++) {sum = 0.0;for (k=0; k<n; k++) sum += a[i][k] * b[k][j];c[i][j] = sum;}} /* ijk */for (i=0; i<n; i++) {for (j=0; j<n; j++) {sum = 0.0;for (k=0; k<n; k++) sum += a[i][k] * b[k][j];c[i][j] = sum;}} Variable sumheld in register–14–15-213, F’07Miss Rate Analysis for Matrix MultiplyMiss Rate Analysis for Matrix MultiplyAssume:Assume: Line size = 32B (big enough for four 64-bit words) Matrix dimension (N) is very largez Approximate 1/N as 0.0 Cache is not even big enough to hold multiple rowsAnalysis Method:Analysis Method: Look at access pattern of inner


View Full Document

CMU CS 15213 - Cache Performance

Documents in this Course
lecture

lecture

14 pages

lecture

lecture

46 pages

Caches

Caches

9 pages

lecture

lecture

39 pages

Lecture

Lecture

36 pages

Lecture

Lecture

45 pages

Lecture

Lecture

56 pages

lecture

lecture

11 pages

lecture

lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

37 pages

Exam

Exam

16 pages

Lecture

Lecture

10 pages

Lecture

Lecture

43 pages

Lecture

Lecture

8 pages

Lecture

Lecture

8 pages

Lecture

Lecture

36 pages

Lecture

Lecture

43 pages

Lecture

Lecture

12 pages

Lecture

Lecture

37 pages

Lecture

Lecture

6 pages

Lecture

Lecture

40 pages

coding

coding

2 pages

Exam

Exam

17 pages

Exam

Exam

14 pages

Lecture

Lecture

29 pages

Lecture

Lecture

34 pages

Exam

Exam

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

37 pages

Lecture

Lecture

36 pages

lecture

lecture

46 pages

Lecture

Lecture

33 pages

Lecture

Lecture

57 pages

Lecture

Lecture

32 pages

Lecture

Lecture

46 pages

Lecture

Lecture

40 pages

Lecture

Lecture

11 pages

Lecture

Lecture

6 pages

Lecture

Lecture

43 pages

Lecture

Lecture

12 pages

Lecture

Lecture

18 pages

Exam

Exam

10 pages

Lecture

Lecture

45 pages

Lecture

Lecture

37 pages

Exam

Exam

24 pages

class09

class09

21 pages

class22

class22

37 pages

class20

class20

30 pages

class27

class27

33 pages

class25

class25

21 pages

class04

class04

31 pages

Lecture

Lecture

59 pages

class01a

class01a

14 pages

class12

class12

45 pages

class29

class29

33 pages

Lecture

Lecture

39 pages

Lecture

Lecture

6 pages

class03

class03

34 pages

lecture

lecture

42 pages

Lecture

Lecture

40 pages

Lecture

Lecture

47 pages

Exam

Exam

19 pages

R06-B

R06-B

25 pages

class17

class17

37 pages

class25

class25

31 pages

Lecture

Lecture

15 pages

final-f06

final-f06

17 pages

Lecture

Lecture

9 pages

lecture

lecture

9 pages

Exam

Exam

15 pages

Lecture

Lecture

22 pages

class11

class11

45 pages

lecture

lecture

50 pages

Linking

Linking

37 pages

Lecture

Lecture

64 pages

Integers

Integers

40 pages

Exam

Exam

11 pages

Lecture

Lecture

37 pages

Lecture

Lecture

44 pages

Lecture

Lecture

37 pages

Lecture

Lecture

9 pages

Lecture

Lecture

37 pages

Lecture

Lecture

45 pages

Final

Final

25 pages

lecture

lecture

9 pages

Lecture

Lecture

30 pages

Lecture

Lecture

16 pages

Final

Final

17 pages

Lecture

Lecture

8 pages

Exam

Exam

11 pages

Lecture

Lecture

47 pages

Lecture

Lecture

9 pages

lecture

lecture

39 pages

Exam

Exam

11 pages

lecture

lecture

41 pages

lecture

lecture

37 pages

Lecture

Lecture

59 pages

Lecture

Lecture

45 pages

Exam 1

Exam 1

18 pages

Lecture

Lecture

41 pages

Lecture

Lecture

32 pages

Lecture

Lecture

30 pages

Lecture

Lecture

9 pages

Lecture

Lecture

9 pages

Lecture

Lecture

15 pages

Lecture

Lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

34 pages

Lecture

Lecture

40 pages

Lecture

Lecture

4 pages

Lecture

Lecture

46 pages

Lecture

Lecture

8 pages

Lecture

Lecture

65 pages

Lecture

Lecture

38 pages

Lecture

Lecture

35 pages

Lecture

Lecture

8 pages

Lecture

Lecture

34 pages

Lecture

Lecture

8 pages

Exam

Exam

13 pages

Lecture

Lecture

43 pages

Lecture

Lecture

9 pages

Lecture

Lecture

12 pages

Lecture

Lecture

9 pages

Lecture

Lecture

34 pages

Lecture

Lecture

43 pages

Lecture

Lecture

7 pages

Lecture

Lecture

45 pages

Lecture

Lecture

24 pages

Lecture

Lecture

47 pages

Lecture

Lecture

12 pages

Lecture

Lecture

20 pages

Lecture

Lecture

9 pages

Exam

Exam

11 pages

Lecture

Lecture

52 pages

Lecture

Lecture

20 pages

Exam

Exam

11 pages

Lecture

Lecture

35 pages

Lecture

Lecture

47 pages

Lecture

Lecture

18 pages

Lecture

Lecture

30 pages

Lecture

Lecture

59 pages

Lecture

Lecture

37 pages

Lecture

Lecture

22 pages

Lecture

Lecture

35 pages

Exam

Exam

23 pages

Lecture

Lecture

9 pages

Lecture

Lecture

22 pages

class12

class12

32 pages

Lecture

Lecture

8 pages

Lecture

Lecture

39 pages

Lecture

Lecture

44 pages

Lecture

Lecture

38 pages

Lecture

Lecture

69 pages

Lecture

Lecture

41 pages

Lecture

Lecture

12 pages

Lecture

Lecture

52 pages

Lecture

Lecture

59 pages

Lecture

Lecture

39 pages

Lecture

Lecture

83 pages

Lecture

Lecture

59 pages

class01b

class01b

17 pages

Exam

Exam

21 pages

class07

class07

47 pages

Lecture

Lecture

11 pages

Odyssey

Odyssey

18 pages

multicore

multicore

66 pages

Lecture

Lecture

6 pages

lecture

lecture

41 pages

lecture

lecture

55 pages

lecture

lecture

52 pages

lecture

lecture

33 pages

lecture

lecture

46 pages

lecture

lecture

55 pages

lecture

lecture

17 pages

lecture

lecture

49 pages

Exam

Exam

17 pages

lecture

lecture

56 pages

Exam 2

Exam 2

16 pages

Exam 2

Exam 2

16 pages

Notes

Notes

37 pages

Lecture

Lecture

40 pages

Lecture

Lecture

36 pages

Lecture

Lecture

43 pages

Lecture

Lecture

25 pages

Exam

Exam

13 pages

Lecture

Lecture

32 pages

Lecture

Lecture

12 pages

Lecture

Lecture

58 pages

Lecture

Lecture

29 pages

Lecture

Lecture

59 pages

Lecture

Lecture

41 pages

Lecture

Lecture

50 pages

Exam

Exam

17 pages

Lecture

Lecture

29 pages

Lecture

Lecture

44 pages

Lecture

Lecture

41 pages

Lecture

Lecture

52 pages

Lecture

Lecture

40 pages

Lecture

Lecture

33 pages

lecture

lecture

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

29 pages

Lecture

Lecture

39 pages

Lecture

Lecture

9 pages

Lecture

Lecture

29 pages

Lecture

Lecture

8 pages

Lecture

Lecture

43 pages

Lecture

Lecture

43 pages

Lecture

Lecture

75 pages

Lecture

Lecture

55 pages

Exam

Exam

12 pages

Lecture

Lecture

43 pages

Lecture

Lecture

35 pages

lecture

lecture

36 pages

Exam

Exam

33 pages

lecture

lecture

56 pages

lecture

lecture

64 pages

lecture

lecture

8 pages

Exam

Exam

14 pages

Lecture

Lecture

43 pages

Lecture

Lecture

36 pages

lecture

lecture

56 pages

lecture

lecture

75 pages

lecture

lecture

36 pages

Lecture

Lecture

50 pages

Lecture

Lecture

45 pages

Lecture

Lecture

13 pages

Exam

Exam

23 pages

Lecture

Lecture

10 pages

Lecture

Lecture

48 pages

Lecture

Lecture

83 pages

lecture

lecture

57 pages

Lecture

Lecture

33 pages

Lecture

Lecture

39 pages

Lecture

Lecture

33 pages

lecture

lecture

54 pages

Lecture

Lecture

30 pages

Exam

Exam

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

40 pages

Exam

Exam

17 pages

Lecture

Lecture

9 pages

Exam

Exam

15 pages

Lecture

Lecture

44 pages

Lecture

Lecture

34 pages

Lecture

Lecture

24 pages

Lecture

Lecture

29 pages

class12

class12

43 pages

lecture

lecture

43 pages

class22

class22

22 pages

R06-B

R06-B

25 pages

class01b

class01b

19 pages

lecture

lecture

29 pages

lab1

lab1

8 pages

Caches

Caches

36 pages

lecture

lecture

55 pages

Lecture,

Lecture,

37 pages

Integers

Integers

40 pages

Linking

Linking

38 pages

lecture

lecture

45 pages

Lecture

Lecture

61 pages

Linking

Linking

33 pages

lecture

lecture

40 pages

lecture

lecture

40 pages

Lecture

Lecture

32 pages

lecture

lecture

48 pages

lecture

lecture

44 pages

Exam

Exam

11 pages

Lecture

Lecture

31 pages

Lecture

Lecture

46 pages

Lecture

Lecture

40 pages

Lecture

Lecture

40 pages

Exam

Exam

12 pages

Lecture

Lecture

42 pages

Lecture

Lecture

36 pages

Lecture

Lecture

45 pages

Lecture

Lecture

41 pages

Lecture

Lecture

13 pages

Lecture

Lecture

35 pages

Lecture

Lecture

20 pages

Final

Final

19 pages

Lecture

Lecture

33 pages

Lecture

Lecture

50 pages

Lecture

Lecture

33 pages

Lecture

Lecture

27 pages

Lecture

Lecture

6 pages

Exam

Exam

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

23 pages

Lecture

Lecture

43 pages

Lecture

Lecture

32 pages

Lecture

Lecture

52 pages

Lecture

Lecture

37 pages

Lecture

Lecture

36 pages

Lecture

Lecture

34 pages

Lecture

Lecture

40 pages

Lecture

Lecture

15 pages

lecture

lecture

21 pages

Lecture

Lecture

58 pages

Lecture

Lecture

49 pages

Lecture

Lecture

36 pages

Lecture

Lecture

11 pages

Lecture

Lecture

12 pages

Lecture

Lecture

58 pages

Lecture

Lecture

33 pages

Exam

Exam

15 pages

Lecture

Lecture

35 pages

Lecture

Lecture

10 pages

Lecture

Lecture

25 pages

Lecture

Lecture

31 pages

Lecture

Lecture

24 pages

Lecture

Lecture

34 pages

Lecture

Lecture

50 pages

lecture

lecture

35 pages

Lecture

Lecture

11 pages

Lecture

Lecture

39 pages

Lecture

Lecture

45 pages

Lecture

Lecture

41 pages

exam1-f05

exam1-f05

11 pages

Lecture

Lecture

4 pages

Lecture

Lecture

17 pages

Exam

Exam

17 pages

malloc()

malloc()

12 pages

Lecture

Lecture

57 pages

Lecture

Lecture

30 pages

Lecture

Lecture

30 pages

Lecture

Lecture

47 pages

Lecture

Lecture

33 pages

Exam

Exam

12 pages

Lecture

Lecture

43 pages

Lectures

Lectures

33 pages

Lecture

Lecture

36 pages

lecture

lecture

33 pages

Exam

Exam

14 pages

Lecture

Lecture

43 pages

Lecture

Lecture

25 pages

Load more
Download Cache Performance
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Cache Performance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Cache Performance 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?