Unformatted text preview:

CS152 Computer Architecture and Engineering Lecture 20 Caches April 14 2003 John Kubiatowicz www cs berkeley edu kubitron lecture slides http inst eecs berkeley edu cs152 Recap The Big Picture Where are We Now The Five Classic Components of a Computer Processor Input Control Memory Datapath Output Today s Topics Recap last lecture Simple caching techniques Many ways to improve cache performance Virtual memory 4 14 04 UCB Spring 2004 CS152 Kubiatowicz The Art of Memory System Design Workload or Benchmark programs Processor reference stream op addr op addr op addr op addr op i fetch read write Memory MEM 4 14 04 Optimize the memory system organization to minimize the average memory access time for typical workloads UCB Spring 2004 CS152 Kubiatowicz Recap Cache Performance Execution Time Instruction Count x Cycle Time x ideal CPI Memory Stalls Inst Other Stalls Inst Memory Stalls Inst Instruction Miss Rate x Instruction Miss Penalty Loads Inst x Load Miss Rate x Load Miss Penalty Stores Inst x Store Miss Rate x Store Miss Penalty Average Memory Access time AMAT Hit TimeL1 Miss RateL1 x Miss PenaltyL1 Hit RateL1 x Hit TimeL1 Miss RateL1 x Miss TimeL1 4 14 04 UCB Spring 2004 CS152 Kubiatowicz Example 1 KB Direct Mapped Cache with 32 B Blocks For a 2 N byte cache The uppermost 32 N bits are always the Cache Tag The lowest M bits are the Byte Select Block Size 2 M One cache miss pull in complete Cache Block or Cache Line Block address Cache Tag Example 0x50 Stored as part of the cache state Cache Tag 0x50 Cache Data Byte 31 Byte 63 Valid Bit 9 Cache Index Ex 0x01 Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3 Byte 1023 4 14 04 4 0 Byte Select Ex 0x00 UCB Spring 2004 31 Byte 992 31 CS152 Kubiatowicz Set Associative Cache N way set associative N entries for each Cache Index N direct mapped caches operates in parallel Example Two way set associative cache Cache Index selects a set from the cache The two tags in the set are compared to the input in parallel Data is selected based on the tag result Valid Cache Tag Adr Tag Compare Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Sel1 1 Mux 0 Sel0 Cache Tag Valid Compare OR Hit 4 14 04 Cache Block UCB Spring 2004 CS152 Kubiatowicz Disadvantage of Set Associative Cache N way Set Associative Cache versus Direct Mapped Cache N comparators vs 1 Extra MUX delay for the data Data comes AFTER Hit Miss decision and set selection In a direct mapped cache Cache Block is available BEFORE Hit Miss Possible to assume a hit and continue Recover later if miss Valid Cache Tag Adr Tag Compare Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Sel1 1 Mux 0 Sel0 Cache Tag Valid Compare OR 4 14 04 Hit Cache Block UCB Spring 2004 CS152 Kubiatowicz Example Fully Associative Fully Associative Cache Forget about the Cache Index Compare the Cache Tags of all cache entries in parallel Example Block Size 32 B blocks we need N 27 bit comparators By definition Conflict Miss 0 for a fully associative cache 31 4 0 Byte Select Cache Tag 27 bits long Ex 0x01 Valid Bit Cache Data Byte 31 Byte 63 Cache Tag Byte 1 Byte 0 Byte 33 Byte 32 4 14 04 UCB Spring 2004 CS152 Kubiatowicz A Summary on Sources of Cache Misses Compulsory cold start or process migration first reference first access to a block Cold fact of life not a whole lot you can do about it Note If you are going to run billions of instruction Compulsory Misses are insignificant Capacity Cache cannot contain all blocks access by the program Solution increase cache size Conflict collision Multiple memory locations mapped to the same cache location Solution 1 increase cache size Solution 2 increase associativity Coherence Invalidation other process e g I O updates memory 4 14 04 UCB Spring 2004 CS152 Kubiatowicz Design options at constant cost Direct Mapped Cache Size Compulsory Miss Big Same N way Set Associative Medium Same Fully Associative Small Same Conflict Miss High Medium Zero Capacity Miss Low Medium High Coherence Miss Same Same Same Note If you are going to run billions of instruction Compulsory Misses are insignificant except for streaming media types of programs 4 14 04 UCB Spring 2004 CS152 Kubiatowicz Four Questions for Caches and Memory Hierarchy Q1 Where can a block be placed in the upper level Block placement Q2 How is a block found if it is in the upper level Block identification Q3 Which block should be replaced on a miss Block replacement Q4 What happens on a write Write strategy 4 14 04 UCB Spring 2004 CS152 Kubiatowicz Q1 Where can a block be placed in the upper level Block 12 placed in 8 block cache Fully associative direct mapped 2 way set associative S A Mapping Block Number Modulo Number Sets Fully associative block 12 can go anywhere Block no 01234567 Direct mapped block 12 can go only into block 4 12 mod 8 Block no 01234567 Set associative block 12 can go anywhere in set 0 12 mod 4 Block no Set Set Set Set 0 1 2 3 Block frame address Block no 4 14 04 01234567 1111111111222222222233 01234567890123456789012345678901 UCB Spring 2004 CS152 Kubiatowicz Q2 How is a block found if it is in the upper level Block Address Tag Block offset Index Set Select Data Select Direct indexing using index and block offset tag compares or combination Increasing associativity shrinks index expands tag 4 14 04 UCB Spring 2004 CS152 Kubiatowicz Q3 Which block should be replaced on a miss Easy for Direct Mapped Set Associative or Fully Associative Random LRU Least Recently Used Associativity 2 way 4 way 8 way Size LRU Random LRU Random LRU Random 16 KB 5 2 5 7 4 7 5 3 4 4 5 0 64 KB 1 9 2 0 1 5 1 7 1 4 1 5 256 KB 1 15 1 17 1 13 1 13 1 12 1 12 4 14 04 UCB Spring 2004 CS152 Kubiatowicz Q4 What happens on a write Write through The information is written to both the block in the cache and to the block in the lowerlevel memory Write back The information is written only to the block in the cache The modified cache block is written to main memory only when it is replaced is block clean or dirty Pros and Cons of each WT PRO read misses cannot result in writes CON Processor held up on writes unless writes buffered WB PRO repeated writes not sent to DRAM processor not held up on writes CON More complex Read miss may require writeback of dirty data WT always combined with write buffers so that don t wait for lower level memory 4 14 04 UCB Spring 2004 CS152 Kubiatowicz New Question How does a store to the cache work Must update cache data but only if cached Otherwise may overwrite unrelated cache data Two cycle


View Full Document

Berkeley COMPSCI 152 - Lecture 20 Caches

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 20 Caches and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 20 Caches and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?