WM CSCI 424 - cache 1 - D570040

Home> Schools> College of William & Mary> Computer Science (CSCI) > CSCI 424> cache 1

WM CSCI 424 - cache 1

Pages 42

Download Save

Unformatted text preview:

Review: The Memory HierarchyReview: Principle of LocalityMeasuring Cache PerformanceReview: The “Memory Wall”Impacts of Cache PerformanceReducing Cache Miss Rates #1CacheDirect Mapped CacheSlide 9Slide 10Hits vs. MissesHardware IssuesPerformanceSlide 14Set Associative CachesPowerPoint PresentationDecreasing miss ratio with associativitySet Associative Cache ExampleSlide 20Four-Way Set Associative CacheSlide 23Costs of Set Associative CachesBenefits of Set Associative CachesSet Associative Caches (in summary)Block Replacement PoliciesExampleExample cont’dSlide 30Decreasing miss penalty with multilevel cachesSlide 32Slide 33Reducing Cache Miss Rates #2Multilevel Cache Design ConsiderationsKey Cache Design ParametersTwo Machines’ Cache Parameters4 Questions for the Memory HierarchyQ1&Q2: Where can a block be placed/found?Q3: Which block should be replaced on a miss?Q4: What happens on a write?Improving Cache PerformanceSlide 43Summary: The Cache Design SpaceImproving Cache Performance.1Review: The Memory HierarchyIncreasing distance from the processor in access timeL1$L2$Main MemorySecondary MemoryProcessor(Relative) size of the memory at each levelInclusive– what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM4-8 bytes (word)1 to 4 blocks1,024+ bytes (disk sector = page)8-32 bytes (block)Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology at the speed offered by the fastest technologyImproving Cache Performance.2Review: Principle of LocalityTemporal Locality Keep most recently accessed data items closer to the processorSpatial LocalityMove blocks consisting of contiguous words to the upper levels Hit Time << Miss PenaltyHit: data appears in some block in the upper level (Blk X) -Hit Rate: the fraction of accesses found in the upper level-Hit Time: RAM access time + Time to determine hit/missMiss: data needs to be retrieve from a lower level block (Blk Y)-Miss Rate = 1 - (Hit Rate)-Miss Penalty: Time to replace a block in the upper level with a block from the lower level + Time to deliver this block’s word to the processor -Miss Types: Compulsory, Conflict, CapacityLower LevelMemoryUpper LevelMemoryTo ProcessorFrom ProcessorBlk XBlk YImproving Cache Performance.3Measuring Cache PerformanceAssuming cache hit costs are included as part of the normal CPU execution cycle, thenCPU time = IC × CPI × CC= IC × (CPIideal + Memory-stall cycles) × CCCPIstallMemory-stall cycles come from cache misses (a sum of read-stalls and write-stalls)Read-stall cycles = reads/program × read miss rate × read miss penaltyWrite-stall cycles = (writes/program × write miss rate × write miss penalty) + write buffer stallsFor write-through caches, we can simplify this toMemory-stall cycles = miss rate × miss penaltyImproving Cache Performance.4Review: The “Memory Wall”Logic vs DRAM speed gap continues to grow0.010.11101001000VAX/1980 PPro/1996 2010+CoreMemoryClocks per instructionClocks per DRAM accessImproving Cache Performance.5Impacts of Cache PerformanceRelative cache penalty increases as processor performance improves (faster clock rate and/or lower CPI)The memory speed is unlikely to improve as fast as processor cycle time. When calculating CPIstall, the cache miss penalty is measured in processor clock cycles needed to handle a missThe lower the CPIideal, the more pronounced the impact of stallsA processor with a CPIideal of 2, a 100 cycle miss penalty, 36% load/store instr’s, and 2% I$ and 4% D$ miss ratesMemory-stall cycles = 2% × 100 + 36% × 4% × 100 = 3.44So CPIstalls = 2 + 3.44 = 5.44What if the CPIideal is reduced to 1? 0.5? 0.25?What if the processor clock rate is doubled (doubling the miss penalty)?Improving Cache Performance.6Reducing Cache Miss Rates #11. Allow more flexible block placementIn a direct mapped cache a memory block maps to exactly one cache blockAt the other extreme, could allow a memory block to be mapped to any cache block – fully associative cacheA compromise is to divide the cache into sets each of which consists of n “ways” (n-way set associative). A memory block maps to a unique set (specified by the index field) and can be placed in any way of that set (so there are n choices)(block address) modulo (# sets in the cache)Improving Cache Performance.7Two issues:How do we know if a data item is in the cache?If it is, how do we find it?Our first example: block size is one word of data "direct mapped"For each item of data at the lower level, there is exactly one location in the cache where it might be.e.g., lots of items at the lower level share locations in the upper levelCacheImproving Cache Performance.8Mapping: address is modulo the number of blocks in the cacheDirect Mapped Cache00001 001010100101101 10001 10101 11001 11101000CacheMemory001010011100101110111Improving Cache Performance.9For MIPS:What kind of locality are we taking advantage of?Direct Mapped CacheA d d re ss (sh o w in g b it p os itio ns )2010By teo ffsetV alid T a g D a taIn d e x01210 2110 2210 23T a gIn d e xH it D a ta2 03 23 1 30 1 3 1 2 1 1 2 1 0Improving Cache Performance.10Taking advantage of spatial locality:Direct Mapped CacheAddress (showing bit positions)16 12 ByteoffsetVTagDataHit Data16324Kentries16 bits 128 bitsMux32 32 32232Block offsetIndexTag31 16 15 4 32 1 0Improving Cache Performance.11Read hitsthis is what we want!Read missesstall the CPU, fetch block from memory, deliver to cache, restart Write hits:can replace data in cache and memory (write-through)write the data only into the cache (write-back the cache later)Write misses:read the entire block into the cache, then write the wordHits vs. MissesImproving Cache Performance.12Make reading multiple words easier by using banks of memoryIt can get a lot more complicated...Hardware IssuesCPUCacheBusMem orya. O ne-word -wide mem ory organizationCPUBusb. W ide memory organizationMem oryMultiplexorCacheC PUCacheBusMem orybank 1Memo ryba nk 2Mem orybank 3Memorybank 0c. Interleaved m emory organiza tionImproving Cache Performance.13Increasing the block size tends to decrease miss rate:Use split caches because there is more spatial locality in code:Performance1 KB8 KB16

View Full Document


School:
Email:
New Password:
Confirm Password:

WM CSCI 424 - cache 1

Sign up for free to view:

Please select your school