Berkeley COMPSCI 152 - Memory Hierarchy-III - D2456881

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 152> Memory Hierarchy-III

DOC PREVIEW

Berkeley COMPSCI 152 - Memory Hierarchy-III

School name University of California, Berkeley

Course Compsci 152- Computer Architecture and Engineering

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Slide 1Last time in Lecture 7Multilevel CachesPresence of L2 influences L1 designInclusion PolicyItanium-2 On-Chip Caches (Intel/HP, 2002)Power 7 On-Chip Caches [IBM 2009]Increasing Cache Bandwidth with Non-Blocking CachesValue of Hit Under Miss for SPEC (old data)CS152 AdministriviaPrefetchingIssues in PrefetchingHardware Instruction PrefetchingHardware Data PrefetchingSoftware PrefetchingSoftware Prefetching IssuesCompiler OptimizationsLoop InterchangeLoop FusionMatrix Multiply, Naïve CodeMatrix Multiply with Cache TilingAcknowledgementsFebruary 11, 2010 CS152, Spring 2010CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-IIIKrste AsanovicElectrical Engineering and Computer SciencesUniversity of California at Berkeleyhttp://www.eecs.berkeley.edu/~krstehttp://inst.eecs.berkeley.edu/~cs152February 11, 2010 CS152, Spring 20102Last time in Lecture 7•3 C’s of cache misses:– compulsory, capacity, conflict•Average memory access time =hit time + miss rate * miss penalty•To improve performance, reduce:–hit time–miss rate–and/or miss penalty•Primary cache parameters:–Total cache capacity–Cache line size–AssociativityFebruary 11, 2010 CS152, Spring 20103Multilevel CachesProblem: A memory cannot be large and fastSolution: Increasing sizes of cache at each levelCPUL1$L2$DRAMLocal miss rate = misses in cache / accesses to cacheGlobal miss rate = misses in cache / CPU memory accessesMisses per instruction = misses in cache / number of instructionsFebruary 11, 2010 CS152, Spring 20104Presence of L2 influences L1 design•Use smaller L1 if there is also L2–Trade increased L1 miss rate for reduced L1 hit time and reduced L1 miss penalty–Reduces average access energy•Use simpler write-through L1 with on-chip L2–Write-back L2 cache absorbs write traffic, doesn’t go off-chip–At most one L1 miss request per L1 access (no dirty victim write back) simplifies pipeline control–Simplifies coherence issues–Simplifies error recovery in L1 (can use just parity bits in L1 and reload from L2 when parity error detected on L1 read)February 11, 2010 CS152, Spring 20105Inclusion Policy•Inclusive multilevel cache: –Inner cache holds copies of data in outer cache–External coherence snoop access need only check outer cache•Exclusive multilevel caches:–Inner cache may hold data not in outer cache–Swap lines between inner/outer caches on miss–Used in AMD Athlon with 64KB primary and 256KB secondary cacheWhy choose one type or the other?February 11, 2010 CS152, Spring 20102/17/20096Itanium-2 On-Chip Caches(Intel/HP, 2002)Level 1: 16KB, 4-way s.a., 64B line, quad-port (2 load+2 store), single cycle latencyLevel 2: 256KB, 4-way s.a, 128B line, quad-port (4 load or 4 store), five cycle latencyLevel 3: 3MB, 12-way s.a., 128B line, single 32B port, twelve cycle latencyFebruary 11, 2010 CS152, Spring 2010Power 7 On-Chip Caches [IBM 2009]732KB L1 I$/core32KB L1 D$/core3-cycle latency256KB Unified L2$/core8-cycle latency32MB Unified Shared L3$Embedded DRAM25-cycle latency to local sliceFebruary 11, 2010 CS152, Spring 20108Increasing Cache Bandwidth withNon-Blocking Caches•Non-blocking cache or lockup-free cache allow data cache to continue to supply cache hits during a miss–requires Full/Empty bits on registers or out-of-order execution•“hit under miss” reduces the effective miss penalty by working during miss vs. ignoring CPU requests•“hit under multiple miss” or “miss under miss” may further lower the effective miss penalty by overlapping multiple misses–Significantly increases the complexity of the cache controller as there can be multiple outstanding memory accesses, and can get miss to line with outstanding miss (secondary miss)–Requires pipelined or banked memory system (otherwise cannot support multiple misses)–Pentium Pro allows 4 outstanding memory misses–(Cray X1E vector supercomputer allows 2,048 outstanding memory misses)February 11, 2010 CS152, Spring 20109Value of Hit Under Miss for SPEC (old data)•Floating-point programs on average: AMAT= 0.68 -> 0.52 -> 0.34 -> 0.26•Integer programs on average: AMAT= 0.24 -> 0.20 -> 0.19 -> 0.19•8 KB Data Cache, Direct Mapped, 32B block, 16 cycle miss, SPEC 92IntegerFloating Point“Hit under n Misses”0->11->22->64BaseFebruary 11, 2010 CS152, Spring 201010CS152 AdministriviaFebruary 11, 2010 CS152, Spring 201011Prefetching•Speculate on future instruction and data accesses and fetch them into cache(s)–Instruction accesses easier to predict than data accesses•Varieties of prefetching–Hardware prefetching–Software prefetching–Mixed schemes•What types of misses does prefetching affect?February 11, 2010 CS152, Spring 201012Issues in Prefetching•Usefulness – should produce hits•Timeliness – not late and not too early•Cache and bandwidth pollutionL1 DataL1 InstructionUnified L2 CacheRFCPUPrefetched dataFebruary 11, 2010 CS152, Spring 201013Hardware Instruction PrefetchingInstruction prefetch in Alpha AXP 21064–Fetch two blocks on a miss; the requested block (i) and the next consecutive block (i+1)–Requested block placed in cache, and next block in instruction stream buffer–If miss in cache but hit in stream buffer, move stream buffer block into cache and prefetch next block (i+2)L1 InstructionUnified L2 CacheRFCPUStreamBufferPrefetchedinstruction blockReq blockReq blockFebruary 11, 2010 CS152, Spring 201014Hardware Data Prefetching•Prefetch-on-miss:–Prefetch b + 1 upon miss on b•One Block Lookahead (OBL) scheme –Initiate prefetch for block b + 1 when block b is accessed–Why is this different from doubling block size?–Can extend to N-block lookahead•Strided prefetch–If observe sequence of accesses to block b, b+N, b+2N, then prefetch b+3N etc.Example: IBM Power 5 [2003] supports eight independent streams of strided prefetch per processor, prefetching 12 lines ahead of current accessFebruary 11, 2010 CS152, Spring 201015Software Prefetching for(i=0; i < N; i++) { prefetch( &a[i + 1] ); prefetch( &b[i + 1] ); SUM = SUM + a[i] * b[i]; }What property do we require of the cache for prefetching to work ?February 11, 2010 CS152, Spring 201016Software Prefetching Issues•Timing is the biggest issue, not predictability–If you prefetch very close to when the data is required, you might be too late–Prefetch too early, cause pollution–Estimate how long it will take for the data to come into L1, so we can set P

View Full Document

Berkeley COMPSCI 152 - Memory Hierarchy-III

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-21-22 out of 22 pages.

Berkeley COMPSCI 152 - Memory Hierarchy-III

Sign up for free to view:

Please select your school