Cache MemoriesFeb. 25, 2003Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performanceclass13.ppt15-213“The course that gives CMU its Zip!”–2–15-213, S’03Cache MemoriesCache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memoryCPU looks first for data in L1, then in L2, then in main memory.Typical bus structure:mainmemoryI/Obridgebus interfaceL2 cacheALUregister fileCPU chipcache bus system busmemory busL1cache–3–15-213, S’03Inserting an L1 Cache Between the CPU and Main Memorya b c dblock 10p q r sblock 21......w x y zblock 30...The big slow mainmemory has room for many 4-word blocks.The small fast L1 cache hasroom for two 4-word blocks.The tiny, very fast CPU register file has room for four 4-byte words.The transfer unit between the cacheand main memoryis a 4-word block (16 bytes).The transfer unit between the CPU register file and the cache is a 4-byte block.line 0line 1–4–15-213, S’03General Org of a Cache Memory••• B–110••• B–110validvalidtagtagset 0:B = 2bbytesper cache blockE lines per setS = 2ssetst tag bitsper lineCache size: C = B x E x S data bytes•••••• B–110••• B–110validvalidtagtagset 1:•••••• B–110••• B–110validvalidtagtagset S-1:••••••Cache is an arrayof sets.Each set containsone or more lines.Each line holds ablock of data.1 valid bit per line–5–15-213, S’03Addressing Cachest bits s bitsb bits<tag> <set index> <block offset>0m-1Address A:•••B–110•••B–110vvtagtagset 0:••••••B–110•••B–110vvtagtagset 1:••••••B–110•••B–110vvtagtagset S-1:••••••The word at address A is in the cache ifthe tag bits in one of the <valid> lines in set <set index> match <tag>.The word contents begin at offset <block offset> bytes from the beginning of the block.–6–15-213, S’03Addressing Cachest bits s bitsb bits<tag> <set index> <block offset>0m-1Address A:•••B–110•••B–110vvtagtagset 0:••••••B–110•••B–110vvtagtagset 1:••••••B–110•••B–110vvtagtagset S-1:••••••1. Locate the set based on <set index>2. Locate the line in the set based on <tag>3. Check that the line is valid4. Locate the data in the line based on<block offset>–7–15-213, S’03Direct-Mapped CacheSimplest kind of cacheCharacterized by exactly one line per set.validvalidvalidtagtagtag•••set 0:set 1:set S-1:E=1 lines per setcache blockcache blockcache blockCache size: C = B x S data bytes–8–15-213, S’03Accessing Direct-Mapped CachesSet selection Use the set index bits to determine the set of interest.t bits s bits0 0 0 0 10m-1b bitstag set index block offsetselected setvalidvalidvalidtagtagtag•••set 0:set 1:set S-1:cache blockcache blockcache block–9–15-213, S’03Accessing Direct-Mapped CachesLine matching and word selection Line matching: Find a valid line in the selected set with a matching tag Word selection: Then extract the wordt bits s bits100i01100m-1b bitstag set index block offsetselected set (i):1 0110 w3w0w1w23012 7456=1?(1) The valid bit must be set= ?(2) The tag bits in the cache line must match the tag bits in the addressIf (1) and (2), then cache hit–10–15-213, S’03Accessing Direct-Mapped CachesLine matching and word selection Line matching: Find a valid line in the selected set with a matching tag Word selection: Then extract the wordt bits s bits100i01100m-1b bitstag set index block offsetselected set (i):1 0110 w3w0w1w23012 7456(3) If cache hit,block offset selects starting byte. –11–15-213, S’03Direct-Mapped Cache SimulationM=16 byte addresses, B=2 bytes/block, S=4 sets, E=1 entry/setAddress trace (reads):0 [00002],1 [00012],7 [01112],8 [10002],0 [00002]xt=1 s=2 b=1xx x0 ? ?vtag datamiss1 0 M[0-1]hitmiss1 0 M[6-7]miss1 1 M[8-9]miss1 0 M[0-1]–12–15-213, S’03Set Associative CachesCharacterized by more than one line per setE=2lines per setvalid tagset 0:set 1:set S-1:•••cache blockvalid tag cache blockvalid tag cache blockvalid tag cache blockvalid tag cache blockvalid tag cache blockE-way associative cache–13–15-213, S’03Accessing Set Associative CachesSet selection identical to direct-mapped cachevalidvalidtagtagset 0:validvalidtagtagset 1:validvalidtagtagset S-1:•••cache blockcache blockcache blockcache blockcache blockcache blockt bits s bits0 0 0 0 10m-1b bitstag set index block offsetselected set–14–15-213, S’03Accessing Set Associative CachesLine matching and word selection must compare the tag in each valid line in the selected set.1 0110 w3w0w1w21 1001selected set (i):3012 7456t bits s bits100i01100m-1b bitstag set index block offset=1?(1) The valid bit must be set= ?(2) The tag bits in one of the cache lines must match the tag bits in the addressIf (1) and (2), then cache hit–15–15-213, S’03Accessing Set Associative CachesLine matching and word selection Word selection is the same as in a direct mapped cache1 0110 w3w0w1w21 1001selected set (i):3012 7456t bits s bits100i01100m-1b bitstag set index block offset(3) If cache hit,block offset selects starting byte. –16–15-213, S’032-Way Associative Cache SimulationM=16 byte addresses, B=2 bytes/block, S=2 sets, E=2 entry/setAddress trace (reads):0 [00002],1 [00012],7 [01112],8 [10002],0 [00002]xxt=2 s=1 b=1x x0 ? ?vtag data000miss1 00 M[0-1]hitmiss1 01 M[6-7]miss1 10 M[8-9]hit–17–15-213, S’03Why Use Middle Bits as Index?High-Order Bit Indexing Adjacent memory lines would map to same cache entry Poor use of spatial localityMiddle-Order Bit Indexing Consecutive memory lines map to different cache lines Can hold S*B*E-byte region of address space in cache at one time4-line CacheHigh-OrderBit IndexingMiddle-OrderBit Indexing0001101100000001001000110100010101100111100010011010101111001101111011110000000100100011010001010110011110001001101010111100110111101111–18–15-213, S’03Multi-Level CachesOptions: separate data and instruction caches, or a unified cachesize:speed:$/Mbyte:line size:200 B3 ns8 B8-64 KB3 ns32 B128 MB DRAM60 ns$1.50/MB8 KB30 GB8 ms$0.05/MBlarger, slower, cheaperMemoryMemoryRegsUnifiedL2CacheUnifiedL2CacheProcessor1-4MB SRAM6 ns$100/MB32 BdiskdiskL1d-cacheL1i-cache–19–15-213, S’03What about writes?Multiple copies
View Full Document