Unformatted text preview:

CS 152 Computer Architecture and Engineering Lecture 8 Memory Hierarchy III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http www eecs berkeley edu krste http inst eecs berkeley edu cs152 Last time in Lecture 7 3 C s of cache misses compulsory capacity conflict Average memory access time hit time miss rate miss penalty To improve performance reduce hit time miss rate and or miss penalty Primary cache parameters Total cache capacity Cache line size Associativity 2 21 2008 CS152 Spring 08 2 Multilevel Caches A memory cannot be large and fast Increasing sizes of cache at each level CPU L1 L2 DRAM Local miss rate misses in cache accesses to cache Global miss rate misses in cache CPU memory accesses Misses per instruction misses in cache number of instructions 2 21 2008 3 CS152 Spring 08 A Typical Memory Hierarchy c 2008 Split instruction data primary caches on chip SRAM CPU RF Multiported register file part of CPU 2 21 2008 L1 Instruction Cache Multiple interleaved memory banks off chip DRAM Memory Unified L2 Cache L1 Data Cache Memory Memory Memory Large unified secondary cache on chip SRAM CS152 Spring 08 4 Presence of L2 influences L1 design Use smaller L1 if there is also L2 Trade increased L1 miss rate for reduced L1 hit time and reduced L1 miss penalty Reduces average access energy Use simpler write through L1 cache with on chip L2 Write back L2 cache absorbs write traffic doesn t go off chip At most one L1 miss request per L1 access no dirty victim write back simplifies pipeline control Simplifies coherence issues Simplifies error recovery in L1 can use just parity bits in L1 and reload from L2 when parity error detected on L1 read 2 21 2008 CS152 Spring 08 5 Inclusion Policy Inclusive multilevel cache Inner cache holds copies of data in outer cache External access need only check outer cache Most common case Exclusive multilevel caches Inner cache may hold data not in outer cache Swap lines between inner outer caches on miss Used in AMD Athlon with 64KB primary and 256KB secondary cache Why choose one type or the other 2 21 2008 CS152 Spring 08 6 Itanium 2 On Chip Caches Intel HP 2002 Level 1 16KB 4 way s a 64B line quad port 2 load 2 store single cycle latency Level 2 256KB 4 way s a 128B line quad port 4 load or 4 store five cycle latency Level 3 3MB 12 way s a 128B line single 32B port twelve cycle latency 2 21 2008 CS152 Spring 08 7 Reducing penalty of associativity Associativity reduces conflict misses but requires expensive area energy delay multi way tag search Two optimizations to reduce cost of associativity Victim caches Way prediction 2 21 2008 CS152 Spring 08 8 Victim Caches Jouppi 1990 CPU Unified L2 Cache L1 Data Cache Direct Map RF Evicted data from L1 HP 7200 Hit data from VC miss in L1 Victim Cache Fully Assoc 4 blocks where Evicted data From VC Victim cache is a small associative back up cache added to a direct mapped cache which holds recently evicted lines First look up in direct mapped cache If miss look in victim cache If hit in victim cache swap hit line with line now evicted from L1 If miss in victim cache L1 victim VC VC victim Fast hit time of direct mapped but with reduced conflict misses 2 21 2008 9 CS152 Spring 08 Way Predicting Caches MIPS R10000 off chip L2 cache Use processor address to index into way prediction table Look in predicted way at given index then HIT MISS Return copy of data from cache Look in other way SLOW HIT change entry in prediction table 2 21 2008 MISS Read block of data from next level of cache CS152 Spring 08 10 Way Predicting Instruction Cache Alpha 21264 like Jump target Jump control 0x4 Add PC addr way Primary Instruction Cache inst Sequential Way Branch Target Way 2 21 2008 11 CS152 Spring 08 Reduce Miss Penalty of Long Blocks Early Restart and Critical Word First Don t wait for full block before restarting CPU Early restart As soon as the requested word of the block arrives send it to the CPU and let the CPU continue execution Spatial locality tend to want next sequential word so not clear size of benefit of just early restart Critical Word First Request the missed word first from memory and send it to the CPU as soon as it arrives let the CPU continue execution while filling the rest of the words in the block Long blocks more popular today Critical Word 1st Widely used block 2 21 2008 CS152 Spring 08 12 Increasing Cache Bandwidth with Non Blocking Caches Non blocking cache or lockup free cache allow data cache to continue to supply cache hits during a miss requires Full Empty bits on registers or out of order execution hit under miss reduces the effective miss penalty by working during miss vs ignoring CPU requests hit under multiple miss or miss under miss may further lower the effective miss penalty by overlapping multiple misses Significantly increases the complexity of the cache controller as there can be multiple outstanding memory accesses and can get miss to line with outstanding miss secondary miss Requires pipelined or banked memory system otherwise cannot support multiple misses Pentium Pro allows 4 outstanding memory misses Cray X1E vector supercomputer allows 2 048 outstanding memory misses 2 21 2008 13 CS152 Spring 08 Value of Hit Under Miss for SPEC old data Hit Under i Misses 2 1 8 Avg Mem Access Time 1 6 1 4 0 1 1 2 1 2 1 2 64 0 8 Base 0 6 0 4 Hit under n Misses Integer ora nasa7 spice2g6 alvinn hydro2d wave5 mdljdp2 doduc su2cor tomcatv swm256 ear fpppp mdljsp2 xlisp compress eqntott espresso 0 2 0 0 1 1 2 2 64 Base Floating Point FP programs on average AMAT 0 68 0 52 0 34 0 26 Int programs on average AMAT 0 24 0 20 0 19 0 19 8 KB Data Cache Direct Mapped 32B block 16 cycle miss SPEC 92 2 21 2008 CS152 Spring 08 14 CS152 Administrivia Textbooks should appear in Cal book store soon already there Or next couple of days Quiz 1 handed back next Tuesday in class First set of open ended labs seemed fine Opinions Encourage you to try to find your own ideas instead of following suggestions 2 21 2008 CS152 Spring 08 15 Prefetching Speculate on future instruction and data accesses and fetch them into cache s Instruction accesses easier to predict than data accesses Varieties of prefetching Hardware prefetching Software prefetching Mixed schemes What types of misses does prefetching affect 2 21 2008 CS152 Spring 08 16 Issues in Prefetching Usefulness should produce hits Timeliness not late and not too early Cache and bandwidth pollution L1 Instruction CPU


View Full Document

Berkeley COMPSCI 152 - Lecture 8 - Memory Hierarchy-III

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 8 - Memory Hierarchy-III and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 8 - Memory Hierarchy-III and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?