15-213 “The course that gives CMU its Zip!”Computer SystemLevels in Memory HierarchyAlpha 21164 Chip PhotoAlpha 21164 Chip CachesLocality of ReferenceCaching: The Basic IdeaBasic Idea (Cont.)Accessing Data in Memory HierarchyDesign Issues for CachesDirect-Mapped CachesIndexing into Direct-Mapped CacheDirect-Mapped Cache Tag MatchingDirect Mapped Cache SimulationWhy Use Middle Bits as Index?Direct Mapped Cache Implementation (DECStation 3100)Properties of Direct Mapped CachesVector Product ExampleThrashing ExampleThrashing Example: Good CaseThrashing Example: Bad CaseSet Associative CacheIndexing into 2-Way Associative Cache2-Way Associative Cache Tag Matching2-Way Set Associative SimulationTwo-Way Set Associative Cache ImplementationFully Associative CacheFully Associative Cache Tag MatchingFully Associative Cache SimulationWrite PolicyWrite Strategies (Cont.)Multi-Level CachesAlpha 21164 HierarchyPentium III Xeon HierarchyCache Performance MetricsCaching as a General PrincipleForms of CachingTopics•Memory Hierarchy–Locality of Reference•SRAM Caches–Direct Mapped–AssociativeCachesOctober 12, 2000class14.ppt15-213“The course that gives CMU its Zip!”CS 213 F’00– 2 –class14.pptComputer SystemdiskDiskdiskDiskMemory-I/O busMemory-I/O busProcessorProcessorCacheCacheMemoryMemoryI/OcontrollerI/OcontrollerI/OcontrollerI/OcontrollerI/OcontrollerI/OcontrollerDisplayDisplayNetworkNetworkinterruptCS 213 F’00– 3 –class14.pptLevels in Memory HierarchyCPUCPUregsregsCacheMemoryMemorydiskdisksize:speed:$/Mbyte:line size:200 B3 ns8 BRegister Cache Memory Disk Memory32 KB / 4MB4 ns$100/MB32 B128 MB60 ns$1.50/MB8 KB30 GB8 ms$0.05/MBlarger, slower, cheaper8 B 32 B 8 KBcache virtual memoryCS 213 F’00– 4 –class14.pptAlpha 21164 Chip PhotoMicroprocessor Report 9/12/94Caches:L1 dataL1 instructionL2 unifiedTLBBranch historyCS 213 F’00– 5 –class14.pptAlpha 21164 Chip CachesCaches:L1 dataL1 instructionL2 unifiedTLBBranch historyRight HalfL2Right HalfL2L1Instr.L1DataL2TagsL3 ControlCS 213 F’00– 6 –class14.pptLocality of ReferencePrinciple of Locality:•Programs tend to reuse data and instructions near those they have used recently.•Temporal locality: recently referenced items are likely to be referenced in the near future.•Spatial locality: items with nearby addresses tend to be referenced close together in time.sum = 0;for (i = 0; i < n; i++)sum += a[i];*v = sum;Locality in Example:•Data–Reference array elements in succession (spatial)•Instructions–Reference instructions in sequence (spatial)–Cycle through loop repeatedly (temporal)CS 213 F’00– 7 –class14.pptCaching: The Basic IdeaMain Memory•Stores wordsA–Z in exampleCache•Stores subset of the words4 in example•Organized in lines–Multiple words–To exploit spatial localityAccess•Word must be in cache for processor to accessBig, Slow MemoryABC•••YZSmall,Fast CacheABGHProcessorCS 213 F’00– 8 –class14.pptBasic Idea (Cont.)Maintaining Cache:•Each time the processor performs a load or store, bring line containing the word into the cache–May need to evict existing line•Subsequent loads or stores to any word in line performed within cacheABGHInitialABCDRead CABCDYZCDRead ZRead DCache holds 2 linesEach with 2 wordsLoad line C+D into cache“Cache miss”Word already in cache“Cache hit”Load line Y+Z into cacheEvict oldest entryCS 213 F’00– 9 –class14.ppt•Between any two levels, memory is divided into lines (aka “blocks”)•Data moves between levels on demand, in line-sized chunks.•Invisible to application programmer–Hardware responsible for cache operation•Upper-level lines a subset of lower-level lines.aabAccess word w in line a (hit)aabAccess word v in line b (miss)wbababvAccessing Data in Memory HierarchyHighLevelLowLevelCS 213 F’00– 10 –class14.pptDesign Issues for CachesKey Questions:•Where should a line be placed in the cache? (line placement)•How is a line found in the cache? (line identification)•Which line should be replaced on a miss? (line replacement)•What happens on a write? (write strategy)Constraints:•Design must be very simple–Hardware realization–All decision making within nanosecond time scale•Want to optimize performance for “typical” programs–Do extensive benchmarking and simulations–Many subtle engineering tradeoffsCS 213 F’00– 11 –class14.pptDirect-Mapped CachesSimplest Design•Each memory line has a unique cache locationParameters•Line (or block) size B = 2b–Number of bytes in each line–Typically 2X–8X word size•Number of Sets S = 2s–Number of lines cache can hold•Total Cache Size = B*S = 2b+sPhysical Address•Address used to reference main memory•m bits to reference M = 2m total bytes•Partition into fields–Offset: Lower b bits indicate which byte within line–Set: Next s bits indicate how to locate line within cache–Tag: Identifies this line when in cachem-bit Physical Addresst s btag set index offsetCS 213 F’00– 12 –class14.pptIndexing into Direct-Mapped Cache•Use set index bits to select cache setSet 0:0 1•••B–1Tag Valid0 1•••B–1Tag Valid0 1•••B–1Tag ValidSet 1:Set S–1:•••t s btag set index offsetPhysical AddressCS 213 F’00– 13 –class14.pptDirect-Mapped Cache Tag MatchingIdentifying Line•Must have tag match high order bits of address•Must have Valid = 10 1•••B–1Tag ValidSelected Set:t s btag set index offsetPhysical Address= ?= 1?•Lower bits of address select byte or word within cache lineCS 213 F’00– 14 –class14.pptDirect Mapped Cache SimulationM=16 byte addresses, B=2 bytes/line, S=4 sets, E=1 entry/setAddress trace (reads):0 [0000] 1 [0001] 13 [1101] 8 [1000] 0 [0000]xt=1 s=2 b=1xx x1 0 m[1] m[0]v tag data0 [0000] (miss)(1)1 0 m[1] m[0]v tag data1 1 m[13] m[12]13 [1101] (miss)(2)1 1 m[9] m[8]v tag data8 [1000] (miss)(3)1 0 m[1] m[0]v tag data1 1 m[13] m[12]0 [0000] (miss)(4)CS 213 F’00– 15 –class14.pptWhy Use Middle Bits as Index?High-Order Bit Indexing•Adjacent memory lines would map to same cache entry•Poor use of spatial localityMiddle-Order Bit Indexing•Consecutive memory lines map to different cache lines•Can hold C-byte region of address space in cache at one time4-line CacheHigh-OrderBit IndexingMiddle-OrderBit
View Full Document