Unformatted text preview:

CS61C Improving Cache Memory and Virtual Memory Introduction Review 1 1 Tag index offset to find matching data support larger blocks reduce misses Where in cache Direct Mapped Cache Conflict Misses if memory addresses compete Fully Associative to let memory data be in any block no Conflict Misses Lecture 18 April 2 1999 Dave Patterson http cs berkeley edu patterson Set Associative Compromise simpler hardware than Fully Associative fewer misses than Direct Mapped LRU Use history to predict replacement www inst eecs berkeley edu cs61c schedule html cs 61C L18 Cache2 1 Patterson Spring 99 UCB Outline cs 61C L18 Cache2 2 Patterson Spring 99 UCB Improving Caches Improving Miss Penalty In general minimize Average Access Time Improving Writes Hit Time x 1 Miss Rate Miss Penalty x Miss Rate Administrivia More Survey results Virtual Memory So far have look at improving Miss Rate Virtual Memory and Cache Larger block size Translation Lookaside Buffer Larger Cache Higher Associativity Conclusion What about Miss Penalty cs 61C L18 Cache2 3 Patterson Spring 99 UCB cs 61C L18 Cache2 4 Patterson Spring 99 UCB Improving Miss Penalty Analyzing a multi level cache hierarchy When caches started becoming popular Miss Penalty was about 10 processor clock cycles We consider the L2 hit and miss times to include the cost of not finding the data in the L1 cache Today 500 MHz Processor 2 nanoseconds per clock cycle and 200 ns to go to DRAM 100 processor clock cycles Similarly the L2 cache hit rate is only for accesses which actually make it to the L2 cache Solution Place another cache between memory and the processor cache Second Level L2 Cache cs 61C L18 Cache2 5 Patterson Spring 99 UCB So how do we calculate it out cs 61C L18 Cache2 6 Patterson Spring 99 UCB Do the numbers for L2 Cache Access time L1 hit time L1 hit rate L1 miss penalty L1 miss rate Access time L1 hit time L1 hit rate L2 hit time L2 hit rate L2 miss penalty L2 miss rate L1 miss rate We simply calculate the L1 miss penalty as being the access time for the L2 cache Assumptions L1 hit time 1 cycle L1 hit rate 90 L2 hit time also L1 miss penalty 4 cycles L2 miss penalty 100 cycles L2 hit rate 90 Access time L1 hit time L1 hit rate L2 hit time L2 hit rate L2 miss penalty 1 L2 hit rate L1 miss rate 1 0 9 4 0 9 100 0 1 1 0 9 0 9 13 6 0 1 2 26 clock cycles cs 61C L18 Cache2 7 Patterson Spring 99 UCB cs 61C L18 Cache2 8 Patterson Spring 99 UCB What would it be without the L2 cache What about Writes to Memory Assume that the L1 miss penalty would be 100 clock cycles Simplest Policy The information is written to both the block in the cache and to the block in the lower level memory 1 0 9 100 0 1 10 9 clock cycles So gain a benefit from having the second larger cache before main memory Problem Writes operate at speed of lower level memory Today s L1 cache size 16 KB 64 KB L2 cache may be 512 KB to 4096 KB cs 61C L18 Cache2 9 Patterson Spring 99 UCB Improving Cache Performance Write Buffer Processor Cache Patterson Spring 99 UCB Improving Cache Performance Write Back Option 2 data is written only to cache block DRAM Modified cache block is written to main memory only when it is replaced Write Buffer A Write Buffer is added between Cache and Memory Processor writes data into cache write buffer Controller write buffer contents to memory Write buffer is just a First In First Out queue Typical number of entries 4 Works fine if Store frequency w r t time 1 DRAM write cycle cs 61C L18 Cache2 11 cs 61C L18 Cache2 10 Patterson Spring 99 UCB Block is unmodified clean or modified dirty This scheme called Write Back original scheme called Write Through Advantage of Write Back Repeated writes to same location stay in cache Disadvantage of Write Back Any block replacement can write to memory cs 61C L18 Cache2 12 Patterson Spring 99 UCB What to do on a Write Miss On a read miss you must bring the block into the cache so that you can complete the read Option 1 Just like read bring whole block into cache and then modify bytes needed Write Allocate Write indicates nearby access in future Option 2 Only update lower level memory nothing in cache No Write Allocate Avoiding Write Buffer Saturation Store frequency 1 DRAM write cycle Store buffer will overflow no matter how long you make queue Solution for write buffer saturation Use a write back 1st level cache or Install a 2nd level cache write back Processor Preference for Write Back vs Write Thru Patterson Spring 99 UCB L2 Cache DRAM Write Buffer Perhaps just clearing memory no reuse cs 61C L18 Cache2 13 Cache cs 61C L18 Cache2 14 Administrivia Project 5 Due 4 14 design and implement a cache in software and plug into instruction simulator Suggestions Microphone volume 8th homework Due 4 7 7PM Separate Project HW due dates Patterson Spring 99 UCB Will try wired mike Will look at separating project HW if same week Exercises 7 7 7 8 7 20 7 22 7 32 Why not more credit for projects Readings 7 2 7 3 7 4 Reduce reward for not doing own work Why not more tests Note point totals F98 Testing vs lecture time time to make good test Why not more units cs 61C L18 Cache2 15 Workload matches units by end of semester Patterson Spring 99 UCB cs 61C L18 Cache2 16 Patterson Spring 99 UCB Questions Look at X86 since its so prevalent Another View of the Memory Hierarchy How many lectures dedicate to this Can make much dent in 1 2 lectures Why recommended 7PM with 8AM deadline Human nature Goal allowed slip Thus far Why front load course Many courses have more work in 2nd half also too many students trying 61C Spring 99 Why not finish in lab Regs Instr Operands Cache Blocks What non trivial task can everyone finish in 50 minutes Reduced homeworks expanded lab cs 61C L18 Cache2 17 Patterson Spring 99 UCB Virtual Memory If Principle of Locality allows caches to offer usually speed of cache memory with size of DRAM memory then recursively why not use at next level to give speed of DRAM memory size of Disk memory Called Virtual Memory Also allows OS to share memory protect programs from each other Blocks Memory Pages Disk Files Tape cs 61C L18 Cache2 18 Comparing the 2 levels of hierarchy Cache Version Virtual Memory vers Block or Line Page Miss Page Fault Block Size 32 64B Page Size 4K 8KB Placement Fully Associative Direct Mapped N way Set Associative Replacement LRU or Random Historically it predates caches Write Thru or Back Write Back Patterson Spring 99 UCB Larger Lower Level Patterson Spring 99 UCB Today more important for protection


View Full Document

Berkeley COMPSCI 61C - Lecture 18

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 18 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 18 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?