DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 8 - Memory Hierarchy-III

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 152 Computer Architecture andEngineering Lecture 8 - Memory Hierarchy-IIIKrste AsanovicElectrical Engineering and Computer SciencesUniversity of California at Berkeleyhttp://www.eecs.be rkeley.edu /~krstehttp://inst.eecs.b erkeley.ed u/~cs1522/21/2008 CS152-Spring!082Last time in Lecture 7• 3 C’s of cache misses: compulsory, capacity, conflict• Average memory access time =hit time + miss rate * miss penalty• To improve performance, reduce:– hit time– miss rate– and/or miss penalty• Primary cache parameters:– Total cache capacity– Cache line size– Associativity2/21/2008 CS152-Spring!083Multilevel Caches• A memory cannot be large and fast• Increasing sizes of cache at each levelCPUL1$L2$DRAMLocal miss rate = misses in cache / accesses to cacheGlobal miss rate = misses in cache / CPU memory accessesMisses per instruction = misses in cache / number of instructions2/21/2008 CS152-Spring!084A Typical Memory Hierarchy c.2008L1 DataCacheL1InstructionCacheUnified L2CacheRFMemoryMemoryMemoryMemoryMultiportedregister file(part of CPU)Split instruction & dataprimary caches(on-chip SRAM)Multiple interleavedmemory banks(off-chip DRAM)Large unified secondary cache(on-chip SRAM)CPU2/21/2008 CS152-Spring!085Presence of L2 influences L1 design• Use smaller L1 if there is also L2– Trade increased L1 miss rate for reduced L1 hit time and reducedL1 miss penalty– Reduces average access energy• Use simpler write-through L1 cache with on-chip L2– Write-back L2 cache absorbs write traffic, doesn’t go off-chip– At most one L1 miss request per L1 access (no dirty victim writeback) simplifies pipeline control– Simplifies coherence issues– Simplifies error recovery in L1 (can use just parity bits in L1 andreload from L2 when parity error detected on L1 read)2/21/2008 CS152-Spring!086Inclusion Policy• Inclusive multilevel cache:– Inner cache holds copies of data in outer cache– External access need only check outer cache– Most common case• Exclusive multilevel caches:– Inner cache may hold data not in outer cache– Swap lines between inner/outer caches on miss– Used in AMD Athlon with 64KB primary and 256KB secondarycacheWhy choose one type or the other?2/21/2008 CS152-Spring!087Itanium-2 On-Chip Caches(Intel/HP, 2002)Level 1, 16KB, 4-way s.a., 64B line,quad-port (2 load+2 store), singlecycle latencyLevel 2, 256KB, 4-way s.a, 128B line,quad-port (4 load or 4 store), fivecycle latencyLevel 3, 3MB, 12-way s.a., 128B line,single 32B port, twelve cyclelatency2/21/2008 CS152-Spring!088Reducing penalty of associativity• Associativity reduces conflict misses, butrequires expensive (area, energy, delay)multi-way tag search• Two optimizations to reduce cost ofassociativity– Victim caches– Way prediction2/21/2008 CS152-Spring!089Victim Caches (Jouppi 1990)Unified L2CacheRFCPUEvicted datafrom L1Evicted dataFrom VCwhere ?Hit data from VC(miss in L1)Victim cache is a small associative back up cache, added to a directmapped cache, which holds recently evicted lines• First look up in direct mapped cache• If miss, look in victim cache• If hit in victim cache, swap hit line with line now evicted from L1• If miss in victim cache, L1 victim -> VC, VC victim->?Fast hit time of direct mapped but with reduced conflict misses(HP 7200)VictimCacheFully Assoc.4 blocksL1 DataCacheDirect Map.2/21/2008 CS152-Spring!0810Way Predicting Caches(MIPS R10000 off-chip L2 cache)• Use processor address to index into way prediction table• Look in predicted way at given index, then:HIT MISSReturn copyof data fromcacheLook in other wayRead block of data fromnext level of cacheMISSSLOW HIT(change entry inprediction table)2/21/2008 CS152-Spring!0811Way Predicting Instruction Cache(Alpha 21264-like)PCaddrinstPrimaryInstructionCache0x4AddSequential WayBranch Target WaywayJump targetJump control2/21/2008 CS152-Spring!0812Reduce Miss Penalty of Long Blocks:Early Restart and Critical Word First• Don’t wait for full block before restarting CPU• Early restart—As soon as the requested word of the blockarrives, send it to the CPU and let the CPU continueexecution– Spatial locality ! tend to want next sequential word, so not clearsize of benefit of just early restart• Critical Word First—Request the missed word first frommemory and send it to the CPU as soon as it arrives; letthe CPU continue execution while filling the rest of thewords in the block– Long blocks more popular today ! Critical Word 1st Widely usedblock2/21/2008 CS152-Spring!0813Increasing Cache Bandwidth withNon-Blocking Caches• Non-blocking cache or lockup-free cache allow datacache to continue to supply cache hits during a miss– requires Full/Empty bits on registers or out-of-order execution• “hit under miss” reduces the effective miss penalty byworking during miss vs. ignoring CPU requests• “hit under multiple miss” or “miss under miss” may furtherlower the effective miss penalty by overlapping multiplemisses– Significantly increases the complexity of the cache controller as there canbe multiple outstanding memory accesses, and can get miss to line withoutstanding miss (secondary miss)– Requires pipelined or banked memory system (otherwise cannot supportmultiple misses)– Pentium Pro allows 4 outstanding memory misses– (Cray X1E vector supercomputer allows 2,048 outstanding memorymisses)2/21/2008 CS152-Spring!0814Value of Hit Under Miss for SPEC(old data)• FP programs on average: AMAT= 0.68 -> 0.52 -> 0.34 -> 0.26• Int programs on average: AMAT= 0.24 -> 0.20 -> 0.19 -> 0.19• 8 KB Data Cache, Direct Mapped, 32B block, 16 cycle miss, SPEC 92Hit Under i MissesAvg. Mem. Access Time00.20.40.60.811.21.41.61.82eqntottespressoxlispcompressmdljsp2earfpppptomcatvswm256doducsu2corwave5mdljdp2hydro2dalvinnnasa7spice2g6ora0->11->22->64BaseIntegerFloating Point“Hit under n Misses”0->11->22->64Base2/21/2008 CS152-Spring!0815CS152 Administrivia• Textbooks should appear in Cal book store soon(already there? Or next couple of days)• Quiz 1 handed back next Tuesday in class.• First set of open-ended labs seemed fine. Opinions?– Encourage you to try to find your own ideas, instead of followingsuggestions.2/21/2008 CS152-Spring!0816Prefetching• Speculate on future instruction and dataaccesses and fetch them into cache(s)– Instruction accesses easier to predict than dataaccesses• Varieties of prefetching– Hardware prefetching– Software prefetching–


View Full Document

Berkeley COMPSCI 152 - Lecture 8 - Memory Hierarchy-III

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 8 - Memory Hierarchy-III
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 8 - Memory Hierarchy-III and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 8 - Memory Hierarchy-III 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?