DOC PREVIEW
Berkeley COMPSCI 152 - Memory Hierarchy-III

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 152 Computer Architecture andEngineering Lecture 8 - Memory Hierarchy-IIIKrste AsanovicElectrical Engineering and Computer SciencesUniversity of California at Berkeleyhttp://www.eecs.be rkeley.edu /~krstehttp://inst.eecs.b erkeley.ed u/~cs1522/19/2009 CS152-Spring!092Last time in Lecture 7• 3 C’s of cache misses:– compulsory, capacity, conflict• Average memory access time =hit time + miss rate * miss penalty• To improve performance, reduce:– hit time– miss rate– and/or miss penalty• Primary cache parameters:– Total cache capacity– Cache line size– Associativity2/19/2009 CS152-Spring!093Multilevel Caches• A memory cannot be large and fast• Increasing sizes of cache at each levelCPUL1$L2$DRAMLocal miss rate = misses in cache / accesses to cacheGlobal miss rate = misses in cache / CPU memory accessesMisses per instruction = misses in cache / number of instructions2/19/2009 CS152-Spring!094A Typical Memory Hierarchy c.2008L1 DataCacheL1InstructionCacheUnified L2CacheRFMemoryMemoryMemoryMemoryMultiportedregister file(part of CPU)Split instruction & dataprimary caches(on-chip SRAM)Multiple interleavedmemory banks(off-chip DRAM)Large unified secondary cache(on-chip SRAM)CPU2/19/2009 CS152-Spring!095Presence of L2 influences L1 design• Use smaller L1 if there is also L2– Trade increased L1 miss rate for reduced L1 hit time andreduced L1 miss penalty– Reduces average access energy• Use simpler write-through L1 with on-chip L2– Write-back L2 cache absorbs write traffic, doesn’t go off-chip– At most one L1 miss request per L1 access (no dirty victim writeback) simplifies pipeline control– Simplifies coherence issues– Simplifies error recovery in L1 (can use just parity bits in L1 andreload from L2 when parity error detected on L1 read)2/19/2009 CS152-Spring!096Inclusion Policy• Inclusive multilevel cache:– Inner cache holds copies of data in outer cache– External access need only check outer cache– Most common case• Exclusive multilevel caches:– Inner cache may hold data not in outer cache– Swap lines between inner/outer caches on miss– Used in AMD Athlon with 64KB primary and 256KB secondarycacheWhy choose one type or the other?2/19/2009 CS152-Spring!097Reducing penalty of associativity• Associativity reduces conflict misses, butrequires expensive (area, energy, delay)multi-way tag search• Two optimizations to reduce cost ofassociativity– Victim caches– Way prediction2/19/2009 CS152-Spring!098Victim Caches (Jouppi 1990)Unified L2CacheRFCPUEvicted datafrom L1Evicted dataFrom VCwhere ?Hit data from VC(miss in L1)Victim cache is a small associative back up cache, added to a directmapped cache, which holds recently evicted lines• First look up in direct mapped cache• If miss, look in victim cache• If hit in victim cache, swap hit line with line now evicted from L1• If miss in victim cache, L1 victim -> VC, VC victim->?Fast hit time of direct mapped but with reduced conflict misses(HP 7200)VictimCacheFully Assoc.4 blocksL1 DataCacheDirect Map.2/19/2009 CS152-Spring!099Way-Predicting Instruction Cache(Alpha 21264-like)PCaddrinstPrimaryInstructionCache0x4AddSequential WayBranch Target WaywayJump targetJump control2/19/2009 CS152-Spring!0910Way Predicting Caches(MIPS R10000 off-chip L2 cache)• Use processor address to index into way prediction table• Look in predicted way at given index, then:HIT MISSReturn copyof data fromcacheLook in other wayRead block of data fromnext level of cacheMISSSLOW HIT(change entry inprediction table)2/19/2009 CS152-Spring!0911Reduce Miss Penalty of Long Blocks:Early Restart and Critical Word First• Don’t wait for full block before restarting CPU• Early restart—As soon as the requested word of the blockarrives, send it to the CPU and let the CPU continueexecution– Spatial locality ! tend to want next sequential word, so not clearsize of benefit of just early restart• Critical Word First—Request the missed word first frommemory and send it to the CPU as soon as it arrives; letthe CPU continue execution while filling the rest of thewords in the block– Long blocks more popular today ! Critical Word 1st Widely usedblock2/19/2009 CS152-Spring!0912Increasing Cache Bandwidth withNon-Blocking Caches• Non-blocking cache or lockup-free cache allow datacache to continue to supply cache hits during a miss– requires Full/Empty bits on registers or out-of-order execution• “hit under miss” reduces the effective miss penalty byworking during miss vs. ignoring CPU requests• “hit under multiple miss” or “miss under miss” may furtherlower the effective miss penalty by overlapping multiplemisses– Significantly increases the complexity of the cache controller as there canbe multiple outstanding memory accesses, and can get miss to line withoutstanding miss (secondary miss)– Requires pipelined or banked memory system (otherwise cannot supportmultiple misses)– Pentium Pro allows 4 outstanding memory misses– (Cray X1E vector supercomputer allows 2,048 outstanding memorymisses)2/19/2009 CS152-Spring!0913Value of Hit Under Miss for SPEC(old data)• FP programs on average: AMAT= 0.68 -> 0.52 -> 0.34 -> 0.26• Int programs on average: AMAT= 0.24 -> 0.20 -> 0.19 -> 0.19• 8 KB Data Cache, Direct Mapped, 32B block, 16 cycle miss, SPEC 92Hit Under i MissesAvg. Mem. Access Time00.20.40.60.811.21.41.61.82eqntottespressoxlispcompressmdljsp2earfpppptomcatvswm256doducsu2corwave5mdljdp2hydro2dalvinnnasa7spice2g6ora0->11->22->64BaseIntegerFloating Point“Hit under n Misses”0->11->22->64Base2/19/2009 CS152-Spring!0914CS152 Administrivia• Last three lectures (L6-L8) on memory hierarchyform material for Quiz 2 (Tuesday March 3)2/19/2009 CS152-Spring!0915Prefetching• Speculate on future instruction and dataaccesses and fetch them into cache(s)– Instruction accesses easier to predict than dataaccesses• Varieties of prefetching– Hardware prefetching– Software prefetching– Mixed schemes• What types of misses doesprefetching affect?2/19/2009 CS152-Spring!0916Issues in Prefetching• Usefulness – should produce hits• Timeliness – not late and not too early• Cache and bandwidth pollutionL1 DataL1InstructionUnified L2CacheRFCPUPrefetched data2/19/2009 CS152-Spring!0917Hardware Instruction PrefetchingInstruction prefetch in Alpha AXP 21064– Fetch two blocks on a miss; the requested block (i) and the nextconsecutive block (i+1)– Requested block


View Full Document

Berkeley COMPSCI 152 - Memory Hierarchy-III

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Memory Hierarchy-III
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Memory Hierarchy-III and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Memory Hierarchy-III 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?