Recap The Big Picture Where are We Now The Five Classic Components of a Computer CS152 Computer Architecture and Engineering Lecture 20 Processor Input Control Memory Caches Datapath April 14 2003 Output Today s Topics John Kubiatowicz www cs berkeley edu kubitron Recap last lecture Simple caching techniques Many ways to improve cache performance lecture slides http inst eecs berkeley edu cs152 Virtual memory 4 14 04 The Art of Memory System Design UCB Spring 2004 CS152 Kubiatowicz Lec20 2 Recap Cache Performance Execution Time Instruction Count x Cycle Time x ideal CPI Memory Stalls Inst Other Stalls Inst Workload or Benchmark programs Processor Memory Stalls Inst Instruction Miss Rate x Instruction Miss Penalty Loads Inst x Load Miss Rate x Load Miss Penalty Stores Inst x Store Miss Rate x Store Miss Penalty reference stream op addr op addr op addr op addr op i fetch read write Memory MEM 4 14 04 Optimize the memory system organization to minimize the average memory access time for typical workloads UCB Spring 2004 CS152 Kubiatowicz Lec20 3 Average Memory Access time AMAT Hit TimeL1 Miss RateL1 x Miss PenaltyL1 Hit RateL1 x Hit TimeL1 Miss RateL1 x Miss TimeL1 4 14 04 UCB Spring 2004 CS152 Kubiatowicz Lec20 4 Example 1 KB Direct Mapped Cache with 32 B Blocks For a 2 N byte cache Set Associative Cache N way set associative N entries for each Cache Index The uppermost 32 N bits are always the Cache Tag The lowest M bits are the Byte Select Block Size 2M N direct mapped caches operates in parallel Example Two way set associative cache One cache miss pull in complete Cache Block or Cache Line Cache Index selects a set from the cache The two tags in the set are compared to the input in parallel Data is selected based on the tag result Block address 31 Example 0x50 4 0 Byte Select Ex 0x00 9 Cache Index Ex 0x01 Stored as part of the cache state Cache Data Byte 31 Byte 63 Cache Tag 0x50 Cache Tag Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3 Adr Tag Cache Index Cache Data Cache Data Compare Cache Block 0 Cache Block 0 Sel1 1 Byte 992 31 4 14 04 CS152 Kubiatowicz Lec20 5 UCB Spring 2004 Disadvantage of Set Associative Cache 4 14 04 Compare 0 Sel0 By definition Conflict Miss 0 for a fully associative cache Possible to assume a hit and continue Recover later if miss Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 CS152 Kubiatowicz Lec20 6 UCB Spring 2004 Forget about the Cache Index Compare the Cache Tags of all cache entries in parallel Example Block Size 32 B blocks we need N 27 bit comparators In a direct mapped cache Cache Block is available BEFORE Hit Miss 31 Cache Tag 4 Cache Tag 27 bits long Valid Ex 0x01 Cache Tag Valid Bit Cache Data Byte 31 Compare Sel1 1 Mux 0 Sel0 Hit CS152 Kubiatowicz Lec20 7 Byte 1 Byte 0 Byte 33 Byte 32 Cache Block UCB Spring 2004 Byte 63 Compare OR 4 14 04 0 Byte Select Adr Tag Fully Associative Cache N comparators vs 1 Extra MUX delay for the data Data comes AFTER Hit Miss decision and set selection Example Fully Associative N way Set Associative Cache versus Direct Mapped Cache Cache Tag Valid Cache Block Hit Valid Mux Cache Tag OR Byte 1023 Valid Bit Valid Cache Tag 4 14 04 UCB Spring 2004 CS152 Kubiatowicz Lec20 8 A Summary on Sources of Cache Misses Design options at constant cost Compulsory cold start or process migration first reference first access to a block Cold fact of life not a whole lot you can do about it Note If you are going to run billions of instruction Compulsory Misses are insignificant Direct Mapped Cache Size Capacity Compulsory Miss Cache cannot contain all blocks access by the program Solution increase cache size Big Same N way Set Associative Medium Fully Associative Small Same Same Conflict Miss High Medium Zero Capacity Miss Low Medium High Coherence Miss Same Same Same Conflict collision Multiple memory locations mapped to the same cache location Solution 1 increase cache size Solution 2 increase associativity Coherence Invalidation other process e g I O updates memory 4 14 04 UCB Spring 2004 CS152 Kubiatowicz Lec20 9 Four Questions for Caches and Memory Hierarchy Note If you are going to run billions of instruction Compulsory Misses are insignificant except for streaming media types of programs 4 14 04 Q1 Where can a block be placed in the upper level Block 12 placed in 8 block cache Q1 Where can a block be placed in the upper level Block placement Fully associative direct mapped 2 way set associative S A Mapping Block Number Modulo Number Sets Q2 How is a block found if it is in the upper level Block identification Q3 Which block should be replaced on a miss Block replacement CS152 Kubiatowicz Lec20 10 UCB Spring 2004 Fully associative block 12 can go anywhere Block no 01234567 Direct mapped block 12 can go only into block 4 12 mod 8 Block no 01234567 Set associative block 12 can go anywhere in set 0 12 mod 4 Block no 01234567 Q4 What happens on a write Write strategy Set Set Set Set 0 1 2 3 Block frame address Block no 4 14 04 UCB Spring 2004 CS152 Kubiatowicz Lec20 11 4 14 04 1111111111222222222233 01234567890123456789012345678901 UCB Spring 2004 CS152 Kubiatowicz Lec20 12 Q2 How is a block found if it is in the upper level Block Address Easy for Direct Mapped Block offset Index Tag Q3 Which block should be replaced on a miss Set Associative or Fully Associative Random LRU Least Recently Used Set Select Associativity 2 way Data Select Direct indexing using index and block offset tag compares or combination UCB Spring 2004 CS152 Kubiatowicz Lec20 13 Q4 What happens on a write Write through The information is written to both the block in the cache and to the block in the lowerlevel memory Write back The information is written only to the block in the cache The modified cache block is written to main memory only when it is replaced is block clean or dirty Pros and Cons of each WT PRO read misses cannot result in writes CON Processor held up on writes unless writes buffered WB PRO repeated writes not sent to DRAM processor not held up on writes CON More complex Read miss may require writeback of dirty data WT always combined with write buffers so that don t wait for lower level memory 4 14 04 UCB Spring 2004 CS152 Kubiatowicz Lec20 15 8 way Size LRU Random LRU Random LRU Random 16 KB 5 2 5 7 4 7 5 3 4 4 5 0 64 KB 1 9 2 0 1 5 1 7 1 4 1 5 256 KB 1 15 1 17 1 13 1 13 1 12 1 12 Increasing associativity shrinks index expands tag 4 14 04 4 way 4 14 04 UCB Spring 2004 CS152 Kubiatowicz Lec20 14 …
View Full Document
Unlocking...