Cache Memories October 8, 2007Cache MemoriesInserting an L1 Cache Between the CPU and Main MemoryGeneral Organization of a CacheAddressing CachesAddressing CachesDirect-Mapped CacheAccessing Direct-Mapped CachesAccessing Direct-Mapped CachesAccessing Direct-Mapped CachesDirect-Mapped Cache SimulationSet Associative CachesAccessing Set Associative CachesAccessing Set Associative CachesAccessing Set Associative Caches2-Way Associative Cache SimulationWhy Use Middle Bits as Index?Maintaining a Set-Associate CacheMulti-Level CachesWhat about writes?Intel Pentium III Cache HierarchyCache Performance MetricsWriting Cache Friendly CodeThe Memory MountainMemory Mountain Test FunctionMemory Mountain Main RoutineThe Memory MountainX86-64 Memory MountainOpteron Memory MountainRidges of Temporal LocalityA Slope of Spatial LocalityMatrix Multiplication ExampleMiss Rate Analysis for Matrix MultiplyLayout of C Arrays in Memory (review)Matrix Multiplication (ijk)Matrix Multiplication (jik)Matrix Multiplication (kij)Matrix Multiplication (ikj)Matrix Multiplication (jki)Matrix Multiplication (kji)Summary of Matrix MultiplicationPentium Matrix Multiply PerformanceImproving Temporal Locality by BlockingBlocked Matrix Multiply (bijk)Blocked Matrix Multiply AnalysisPentium Blocked Matrix Multiply PerformanceConcluding ObservationsCache MemoriesOctober 8, 2007Cache MemoriesOctober 8, 200715-213TopicsTopics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance The memory mountainclass12.ppt15-213, F’07–2–15-213, F’07Cache MemoriesCache MemoriesCache memories are small, fast SRAMCache memories are small, fast SRAM--based memories based memories managed automatically in hardware. managed automatically in hardware. Hold frequently accessed blocks of main memoryCPU looks first for data in L1, then in L2, then in main CPU looks first for data in L1, then in L2, then in main memory.memory.Typical system structure:Typical system structure:mainmemoryI/Obridgebus interfaceL2 dataALUregister fileCPU chipSRAM Portsystem busmemory busL1 cacheL2tags–3–15-213, F’07Inserting an L1 Cache Between the CPU and Main MemoryInserting an L1 Cache Between the CPU and Main Memorya b c dblock 10p q r sblock 21......w x y zblock 30...The big slow main memory has room for many 4-word blocks.The small fast L1 cache has room for two 4-word blocks.The tiny, very fast CPU register file has room for four 4-byte words.The transfer unit between the cacheand main memoryis a 4-word block (16 bytes).The transfer unit between the CPU register file and the cache is a 4-byte block.line 0line 1–4–15-213, F’07General Organization of a CacheGeneral Organization of a Cache••• B–110••• B–110validvalidtagtagset 0:B = 2bbytesper cache blockE lines per setS = 2ssetst tag bitsper lineCache size: C = B x E x S data bytes•••••• B–110••• B–110validvalidtagtagset 1:•••••• B–110••• B–110validvalidtagtagset S-1:••••••Cache is an arrayof sets.Each set containsone or more lines.Each line holds ablock of data.1 valid bit per line–5–15-213, F’07Addressing CachesAddressing Cachest bits s bitsb bits<tag> <set index> <block offset>0m-1Address A:•••B–110•••B–110vvtagtag•••set 0:•••B–110•••B–110vvtagtag•••set 1:•••B–110•••B–110vvtagtagset S-1:••••••The word at address A is in the cache ifthe tag bits in one of the <valid> lines in set <set index> match <tag>.The word contents begin at offset <block offset> bytes from the beginning of the block.–6–15-213, F’07Addressing CachesAddressing Cachest bits s bitsb bits<tag> <set index> <block offset>0m-1Address A:•••B–110•••B–110vvtagtag•••set 0:•••B–110•••B–110vvtagtag•••set 1:•••B–110•••B–110vvtagtagset S-1:••••••1. Locate the set based on <set index>2. Locate the line in the set based on <tag>3. Check that the line is valid4. Locate the data in the line based on<block offset>–7–15-213, F’07Direct-Mapped CacheDirect-Mapped CacheSimplest kind of cache, easy to buildSimplest kind of cache, easy to build(only 1 tag compare required per access)(only 1 tag compare required per access)Characterized by exactly one line per set.Characterized by exactly one line per set.validvalidvalidtagtagtagset S-1:•••set 0:set 1:E=1 lines per setcache blockcache blockcache blockCache size: C = B x S data bytes–8–15-213, F’07Accessing Direct-Mapped CachesAccessing Direct-Mapped CachesSet selectionSet selection Use the set index bits to determine the set of interest.t bits s bits0 0 0 0 10b bitstag set index block offsetm-1selected setvalidvalidvalidtagtagtag•••set 0:set 1:set S-1:cache blockcache blockcache block–9–15-213, F’07Accessing Direct-Mapped CachesAccessing Direct-Mapped CachesLine matching and word selectionLine matching and word selection Line matching: Find a valid line in the selected set with a matching tag Word selection: Then extract the wordt bits s bits100i01100b bitstag set index block offsetselected set (i):m-11 0110 w3w0w1w23012 7456=1?(1) The valid bit must be set= ?(2) The tag bits in the cache line must match the tag bits in the addressIf (1) and (2), then cache hit–10–15-213, F’07Accessing Direct-Mapped CachesAccessing Direct-Mapped CachesLine matching and word selectionLine matching and word selection Line matching: Find a valid line in the selected set with a matching tag Word selection: Then extract the wordt bits s bits100i01100b bitstag set index block offsetselected set (i):m-11 0110 w3w0w1w23012 7456(3) If cache hit,block offset selects starting byte.–11–15-213, F’07Direct-Mapped Cache SimulationDirect-Mapped Cache SimulationM=16 byte addresses, B=2 bytes/block, S=4 sets, E=1 entry/setAddress trace (reads):0 [00002], 1 [00012], 7 [01112], 8 [10002], 0 [00002]xt=1 s=2 b=1xx x0 ? ?vtag datamiss1 0 M[0-1]hitmiss1 0 M[6-7]miss1 1 M[8-9]miss1 0 M[0-1]–12–15-213, F’07Set Associative CachesSet Associative CachesCharacterized by more than one line per setCharacterized by more than one line per setE=2lines per setvalid tagset 0:set 1:set S-1:•••cache blockvalid tag cache blockvalid tag cache blockvalid tag cache blockvalid tag cache blockvalid tag cache blockE-way associative cache–13–15-213, F’07Accessing Set
View Full Document