15 213 The course that gives CMU its Zip Cache Memories September 30 2008 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance lecture 10 ppt Announcements Exam grading done Everyone should have gotten email with score out of 72 mean was 50 high was 70 solution sample should be up on website soon Getting your exam back some got them in recitation working on plan for everyone else worst case recitation on Monday If you think we made a mistake in grading 2 please read the syllabus for details about the process for handling it 15 213 S 08 General cache mechanics Cache 48 9 10 4 Memory 3 14 10 3 Data is copied between levels in block sized transfer units 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 From lecture 9 ppt Smaller faster more expensive memory caches a subset of the blocks Larger slower cheaper memory is partitioned into blocks 15 213 S 08 Cache Performance Metrics Miss Rate Fraction of memory references not found in cache misses accesses 1 hit rate Typical numbers in percentages 3 10 for L1 can be quite small e g 1 for L2 depending on size etc Hit Time Time to deliver a line in the cache to the processor includes time to determine whether the line is in the cache Typical numbers 1 2 clock cycle for L1 5 20 clock cycles for L2 Miss Penalty 4 Additional time required because of a miss typically 50 200 cycles for main memory Trend increasing 15 213 S 08 Lets think about those numbers Huge difference between a hit and a miss 100X if just L1 and main memory Would you believe 99 hits is twice as good as 97 Consider these numbers cache hit time of 1 cycle miss penalty of 100 cycles So average access time is 97 hits 1 cycle 0 03 100 cycles 4 cycles 99 hits 1 cycle 0 01 100 cycles 2 cycles This is why miss rate is used instead of hit rate 5 15 213 S 08 Many types of caches Examples Hardware L1 and L2 CPU caches TLBs Software virtual memory FS buffers web browser caches Many common design issues each cached item has a tag an ID plus contents need a mechanism to efficiently determine whether given item is cached on a miss usually need to pick something to replace with the new item combinations of indices and constraints on valid locations called a replacement policy on writes need to either propagate change or mark item as dirty write through vs write back Different solutions for different caches 6 Lets talk about CPU caches as a concrete example 15 213 S 08 Hardware cache memories Cache memories are small fast SRAM based memories managed automatically in hardware Hold frequently accessed blocks of main memory CPU looks first for data in L1 then in main memory Typical system structure CPU chip register file L1 cache ALU bus bus interface 7 main memory 15 213 S 08 Inserting an L1 Cache Between the CPU and Main Memory The transfer unit between the CPU register file and the cache is a 4 byte word The transfer unit between the cache and main memory is a 4word block 16 bytes The tiny very fast CPU register file has room for four 4 byte words line 0 line 1 block 10 block 21 block 30 8 The small fast L1 cache has room for two 4 word blocks pqrs abcd The big slow main memory has room for many 4 word blocks wxyz 15 213 S 08 Inserting an L1 Cache Between the CPU and Main Memory The transfer unit between the CPU register file and the cache is a 4 byte word The transfer unit between the cache and main memory is a 4word block 16 bytes The tiny very fast CPU register file has room for four 4 byte words The small fast L1 cache has room for two 4 word blocks line 0 line 1 block 21 w w w w block 10 w w w w The big slow main memory has room for many 4 word blocks block 30 w w w w 9 15 213 S 08 General Organization of a Cache Cache is an array of sets Each set contains one or more lines Each line holds a S 2s sets block of data t tag bits per line 1 valid bit per line set 0 set 1 valid tag valid tag valid tag valid tag B 2b bytes per cache block 0 1 B 1 0 1 B 1 E lines per set 0 1 B 1 0 1 B 1 set S 1 10 valid tag valid tag 0 1 B 1 0 1 B 1 Cache size C B x E x S data bytes 15 213 S 08 General Organization of a Cache Cache is an array of sets Each set contains one or more lines Each line holds a S 2s sets block of data t tag bits per line 1 valid bit per line set 0 set 1 valid tag valid tag valid tag valid tag B 2b bytes per cache block 0 1 B 1 0 1 B 1 E lines per set 0 1 B 1 0 1 B 1 set S 1 11 valid tag valid tag 0 1 B 1 0 1 B 1 Cache size C B x E x S data bytes 15 213 S 08 Addressing Caches Address A t bits set 0 set 1 v tag v tag v tag v tag 0 1 B 1 0 1 B 1 set S 1 12 tag v tag b bits m 1 tag 0 set index block offset 0 1 B 1 0 1 B 1 v s bits 0 1 B 1 0 1 B 1 The word at address A is in the cache if the tag bits in one of the valid lines in set set index match tag The word contents begin at offset block offset bytes from the beginning of the block 15 213 S 08 Addressing Caches Address A t bits set 0 set 1 v tag v tag v tag v tag 0 1 B 1 0 1 B 1 set S 1 13 tag v tag b bits m 1 tag 0 set index block offset 0 1 B 1 0 1 B 1 v s bits 0 1 B 1 0 1 B 1 1 Locate the set based on set index 2 Locate the line in the set based on tag 3 Check that the line is valid 4 Locate the data in the line based on block offset 15 213 S 08 Example Direct Mapped Cache Simplest kind of cache easy to build only 1 tag compare required per access Characterized by exactly one line per set set 0 valid tag cache block set 1 valid tag cache block E 1 lines per set set S 1 valid tag cache block Cache size C B x S data bytes 14 …
View Full Document