Cache MemoriesOct. 10, 2002Cache MemoriesOct. 10, 2002TopicsTopicsn Generic cache memory organizationn Direct mapped cachesn Set associative cachesn Impact of caches on performanceclass14.ppt15-213“The course that gives CMU its Zip!”– 2 –15-213, F’02Cache MemoriesCache MemoriesCache memories are small, fast SRAM-based memoriesCache memories are small, fast SRAM-based memoriesmanaged automatically in hardware.managed automatically in hardware.n Hold frequently accessed blocks of main memoryCPU looks first for data in L1, then in L2, then in mainCPU looks first for data in L1, then in L2, then in mainmemory.memory.Typical bus structure:Typical bus structure:mainmemoryI/Obridgebus interfaceL2 cacheALUregister fileCPU chipcache bus system bus memory busL1 cache– 3 –15-213, F’02Inserting an L1 Cache Betweenthe CPU and Main MemoryInserting an L1 Cache Betweenthe CPU and Main Memorya b c dblock 10p q r sblock 21......w x y zblock 30...The big slow main memoryhas room for many 4-wordblocks.The small fast L1 cache has roomfor two 4-word blocks.The tiny, very fast CPU register filehas room for four 4-byte words.The transfer unit betweenthe cache and main memory is a 4-word block(16 bytes).The transfer unit betweenthe CPU register file andthe cache is a 4-byte block.line 0line 1– 4 –15-213, F’02General Org of a Cache MemoryGeneral Org of a Cache Memory• • • B–110• • • B–110validvalidtagtagset 0:B = 2b bytesper cache blockE lines per setS = 2s setst tag bitsper line1 valid bitper lineCache size: C = B x E x S data bytes• • •• • • B–110• • • B–110validvalidtagtagset 1:• • •• • • B–110• • • B–110validvalidtagtagset S-1:• • •• • •Cache is an arrayof sets.Each set containsone or more lines.Each line holds ablock of data.– 5 –15-213, F’02Addressing CachesAddressing Cachest bits s bitsb bits0m-1<tag> <set index> <block offset>Address A:• • • B–110• • • B–110vvtagtagset 0:• • •• • • B–110• • • B–110vvtagtagset 1:• • •• • • B–110• • • B–110vvtagtagset S-1:• • •• • •The word at address A is in the cache ifthe tag bits in one of the <valid> lines in set <set index> match <tag>.The word contents begin at offset <block offset> bytes from the beginning of the block.– 6 –15-213, F’02Direct-Mapped CacheDirect-Mapped CacheSimplest kind of cacheSimplest kind of cacheCharacterized by exactly one line per set.Characterized by exactly one line per set.validvalidvalidtagtagtag• • •set 0:set 1:set S-1:E=1 lines per setcache blockcache blockcache block– 7 –15-213, F’02Accessing Direct-Mapped CachesAccessing Direct-Mapped CachesSet selectionSet selectionn Use the set index bits to determine the set of interest.validvalidvalidtagtagtag• • •set 0:set 1:set S-1:t bits s bits0 0 0 0 10m-1b bitstag set index block offsetselected setcache blockcache blockcache block– 8 –15-213, F’02Accessing Direct-Mapped CachesAccessing Direct-Mapped CachesLine matching and word selectionLine matching and word selectionn Line matching: Find a valid line in the selected set with amatching tagn Word selection: Then extract the word1t bits s bits100i01100m-1b bitstag set index block offsetselected set (i):(3) If (1) and (2), then cache hit,and block offset selectsstarting byte. =1?(1) The valid bit must be set= ?(2) The tag bits in the cacheline must match thetag bits in the address0110 w3w0w1w230 1 2 74 5 6– 9 –15-213, F’02Direct-Mapped Cache SimulationDirect-Mapped Cache SimulationM=16 byte addresses, B=2 bytes/block,S=4 sets, E=1 entry/setAddress trace (reads):0 [00002], 1 [00012], 13 [11012], 8 [10002], 0 [00002]xt=1 s=2 b=1xx x1 0 m[1] m[0]v tag data0 [00002] (miss)(1)1 0 m[1] m[0]v tag data1 1 m[13] m[12]13 [11012] (miss)(3)1 1 m[9] m[8]v tag data8 [10002] (miss)(4)1 0 m[1] m[0]v tag data1 1 m[13] m[12]0 [00002] (miss)(5)0 M[0-1]11 M[12-13]11 M[8-9]11 M[12-13]10 M[0-1]11 M[12-13]10 M[0-1]1– 10 –15-213, F’02Why Use Middle Bits as Index?Why Use Middle Bits as Index?High-Order Bit IndexingHigh-Order Bit Indexingn Adjacent memory lines wouldmap to same cache entryn Poor use of spatial localityMiddle-Order Bit IndexingMiddle-Order Bit Indexingn Consecutive memory lines mapto different cache linesn Can hold C-byte region ofaddress space in cache at onetime4-line CacheHigh-OrderBit IndexingMiddle-OrderBit Indexing0001101100000001001000110100010101100111100010011010101111001101111011110000000100100011010001010110011110001001101010111100110111101111– 11 –15-213, F’02Set Associative CachesSet Associative CachesCharacterized by more than one line per setCharacterized by more than one line per setvalid tagset 0:E=2 lines per setset 1:set S-1:• • •cache blockvalid tag cache blockvalid tag cache blockvalid tag cache blockvalid tag cache blockvalid tag cache block– 12 –15-213, F’02Accessing Set Associative CachesAccessing Set Associative CachesSet selectionSet selectionn identical to direct-mapped cachevalidvalidtagtagset 0:validvalidtagtagset 1:validvalidtagtagset S-1:• • •t bits s bits0 0 0 0 10m-1b bitstag set index block offsetSelected setcache blockcache blockcache blockcache blockcache blockcache block– 13 –15-213, F’02Accessing Set Associative CachesAccessing Set Associative CachesLine matching and word selectionLine matching and word selectionn must compare the tag in each valid line in the selected set.1 0110 w3w0w1w21 1001t bits s bits100i01100m-1b bitstag set index block offsetselected set (i):=1?(1) The valid bit must be set.= ?(2) The tag bits in oneof the cache lines mustmatch the tag bits inthe address(3) If (1) and (2), thencache hit, and block offset selectsstarting byte.30 1 2 74 5 6– 14 –15-213, F’02Multi-Level CachesMulti-Level CachesOptions: separate Options: separate datadata and and instruction cachesinstruction caches, or a, or aunified cacheunified cachesize:speed:$/Mbyte:line size:200 B3 ns8 B8-64 KB3 ns32 B128 MB DRAM60 ns$1.50/MB8 KB30 GB8 ms$0.05/MBlarger, slower, cheaperMemoryMemoryL1 d-cacheRegsUnifiedL2 CacheUnifiedL2 CacheProcessor1-4MB SRAM6 ns$100/MB32 BL1 i-cachediskdisk– 15 –15-213, F’02Processor ChipProcessor ChipIntel Pentium Cache HierarchyIntel Pentium Cache HierarchyL1 Data1 cycle latency16 KB4-way assocWrite-through32B linesL1 Instruction16 KB, 4-way32B linesRegs.L2 Unified128KB--2 MB4-way assocWrite-backWrite allocate32B linesL2
View Full Document