15-740Computer SystemLevels in a typical memory hierarchyAlpha 21164 Chip PhotoAlpha 21164 Chip CachesAccessing data in a memory hierarchyLocality of referenceKey questions for cachesAddress spacesPartitioning address spacesSlide 11Slide 12Slide 13Slide 14Slide 15Slide 16Basic cache organizationDirect mapped cache (E = 1)Direct Mapped Cache SimulationDirect Mapped Cache Implementation (DECStation 3100)E-way Set-Associative Cache2-Way Set Associative SimulationTwo-Way Set Associative Cache ImplementationFully associative cache (E = C/B)Fully associative cache simulationReplacement AlgorithmsImplementing RAND and FIFOWrite StrategiesWrite BufferingAlpha AXP 21064 direct mapped data cacheWrite mergingMulti-level cachesAlpha 21164 HierarchyBandwidth MatchingHigh Bandwidth Memory SystemsCache Performance MetricsImpact of Cache and Block SizeImpact of AssociativityImpact of Replacement StrategyImpact of Write StrategyAllocation StrategiesAllocation Strategies (Cont.)Qualitative Cache Performance ModelInteractions Between Program & CacheMatmult Performance (Sparc20)Layout of Arrays in MemoryMiss Rate AnalysisMatrix multiplication (ijk)Matrix multiplication (jik)Matrix multiplication (kij)Matrix multiplication (ikj)Matrix multiplication (jki)Matrix multiplication (kji)Summary of Matrix MultiplicationMatmult performance (DEC5000)Slide 56Matmult Performance (Alpha 21164)Block Matrix MultiplicationBlocked Matrix Multiply (bijk)Blocked Matrix Multiply AnalysisBlocked matmult perf (DEC5000)Blocked matmult perf (Sparc20)Blocked matmult perf (Alpha 21164)Topics•Memory Hierarchy•Cache Design•Optimizing Program PerformanceCachesOct. 8, 199815-740CS 740 F’98– 2 –Computer SystemdiskDiskdiskDiskMemory-I/O busMemory-I/O busProcessorProcessorCacheCacheMemoryMemoryI/OcontrollerI/OcontrollerI/OcontrollerI/OcontrollerI/OcontrollerI/OcontrollerDisplayDisplayNetworkNetworkinterruptsCS 740 F’98– 3 –Levels in a typical memory hierarchyCPUCPUregsregsCacheMemoryMemorydiskdisksize:speed:$/Mbyte:block size:200 B5 ns4 Bregister referencecachereferencememoryreferencedisk memoryreference32KB -- 4MB6 ns$256/MB16 B128 MB100 ns$2/MB4 KB10GB10 ms$0.10/MBlarger, slower, cheaper4 B 32 B 4 KBcache virtual memoryCS 740 F’98– 4 –Alpha 21164 Chip Photo•Microprocessor Report 9/12/94Caches:•L1 data•L1 instruction•L2 unified•TLB•Branch historyCS 740 F’98– 5 –Alpha 21164 Chip Caches•Microprocessor Report 9/12/94Caches:•L1 data•L1 instruction•L2 unified•TLB•Branch historyRight HalfL2Right HalfL2L1Instr.L1DataL2TagsL2 (3?)ControlCS 740 F’98– 6 –aabAccess word w in block a (hit)aabAccess word v in block b (miss)wbababvAccessing data in a memory hierarchy•Between any two levels, memory divided into blocks.•Data moves between levels on demand, in block-sized chunks.•Upper-level blocks a subset of lower-level blocks.HighLevelLowLevelCS 740 F’98– 7 –Locality of referencePrinciple of Locality•Programs tend to reuse data and instructions near those they have used recently.•Temporal locality: recently referenced items are likely to be referenced in the near future.•Spatial locality: items with nearby addresses tend to be referenced close together in time.sum = 0;for (i = 0; i < n; i++)sum += a[i];*v = sum;Locality in Example•Data–Reference array elements in succession (spatial)•Instruction–Reference instructions in sequence (spatial)–Cycle through loop repeatedly (temporal)CS 740 F’98– 8 –Key questions for cachesQ1: Where should a block be placed in the cache? (block placement)Q2: How is a block found in the cache? (block identification)Q3: Which block should be replaced on a miss? (block replacement)Q4: What happens on a write? (write strategy)CS 740 F’98– 9 –Address spacesAn n-bit address defines an address space of 2n items: 0,...,2n-1.0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111Address space for n=5CS 740 F’98– 10 –Partitioning address spacesKey idea: partitioning the address bits partitions the address space. In general, an address partitioned into sets of t (tag), s (set index), and b (block offset) bits, e.g., t s bbelongs to one of 2s equivalence classes (sets), where each setconsists of 2t blocks of addresses, and each block consists of 2b addresses.The s bits uniquely identify an equivalence class.The t bits uniquely identify each block in the equivalence class.The b bits define the offset of an address within a block (block offset).address = tag set index offsetCS 740 F’98– 11 –Partitioning address spaces00000000010001000011001000010100110001110100001001010100101101100011010111001111100001000110010100111010010101101101011111000110011101011011111001110111110111111t=1 s=3 b=1block 1offset 0011 02s = 8 sets of blocks2t = 2 blocks/set2b = 2 addresses/block.set 011CS 740 F’98– 12 –Partitioning address spaces00000000010001000011001000010100110001110100001001010100101101100011010111001111100001000110010100111010010101101101011111000110011101011011111001110111110111110t=1 s=3 b=1block 0offset 1110 12s = 8 sets of blocks2t = 2 blocks/set2b = 2 addresses/block.set 110CS 740 F’98– 13 –Partitioning address spaces000000000100010000110010000101001100011101000010010101001011011000110101110011111000010001100101001110100101011011010111110001100111010110111110011101111101111110t=2 s=2 b=1block 10offset 011 02s = 4 sets of blocks2t = 4 blocks/set2b = 2 addresses/block.set 11CS 740 F’98– 14 –Partitioning address spaces000000000100010000110010000101001100011101000010010101001011011000110101110011111000010001100101001110100101011011010111110001100111010110111110011101111101111101t=2 s=2 b=1block 01offset 110 12s = 4 sets of blocks2t = 4 blocks/set2b = 2 addresses/block.set 10CS 740 F’98– 15 –Partitioning address spaces0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111101t=3 s=1 b=1block 101offset 01 02s = 2 sets of blocks2t = 8 blocks/set2b = 2 addresses/block.set 1CS 740 F’98– 16 –0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111Partitioning address spaces1011t=4 s=0 b=1set øblock 1011offset 002s = 1 set of blocks2t = 16 blocks/set2b = 2 addresses/block.CS
View Full Document