CS152 Computer Architecture and Engineering Lecture 22 Advanced CachingRecap: Set Associative CacheRecap: Cache PerformanceRecap: A Summary on Sources of Cache MissesThe Big Picture: Where are We Now?How Do you Design a Memory System?Impact on Cycle TimeImproving Cache Performance: 3 general optionsImproving Cache Performance3Cs Absolute Miss Rate (SPEC92)2:1 Cache Rule3Cs Relative Miss Rate1. Reduce Misses via Larger Block Size2. Reduce Misses via Higher AssociativityExample: Avg. Memory Access Time vs. Miss Rate3. Reducing Misses via a “Victim Cache”4. Reducing Misses by Hardware Prefetching5. Reducing Misses by Software Prefetching Data6. Reducing Misses by Compiler OptimizationsAdministriviaImproving Cache Performance (Continued)0. Reducing Penalty: Faster DRAM / Interface1. Reducing Penalty: Read Priority over Write on MissRAW Hazards from Write Buffer!2. Reduce Penalty: Early Restart and Critical Word First3. Reduce Penalty: Non-blocking CachesWhat happens on a Cache miss?Value of Hit Under Miss for SPEC4. Reduce Penalty: Second-Level CacheReducing Misses: which apply to L2 Cache?L2 cache block size & A.M.A.T.Slide 32Example: Harvard ArchitectureSummary: Cache techniques4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.1CS152Computer Architecture and EngineeringLecture 22Advanced CachingApril 23, 2003John Kubiatowicz (www.cs.berkeley.edu/~kubitron)lecture slides: http://inst.eecs.berkeley.edu/~cs152/4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.2Recap: Set Associative Cache°N-way set associative: N entries for each Cache Index•N direct mapped caches operates in parallel°Example: Two-way set associative cache•Cache Index selects a “set” from the cache•The two tags in the set are compared to the input in parallel•Data is selected based on the tag resultCache DataCache Block 0Cache TagValid:: :Cache DataCache Block 0Cache Tag Valid: ::Cache IndexMux01Sel1 Sel0Cache BlockCompareAdr TagCompareORHit4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.3Execution_Time = Instruction_Count x Cycle_Time x (ideal CPI + Memory_Stalls/Inst + Other_Stalls/Inst)Memory_Stalls/Inst = Instruction Miss Rate x Instruction Miss Penalty +Loads/Inst x Load Miss Rate x Load Miss Penalty +Stores/Inst x Store Miss Rate x Store Miss PenaltyAverage Memory Access time (AMAT) = Hit TimeL1 + (Miss RateL1 x Miss PenaltyL1) =(Hit RateL1 x Hit TimeL1) + (Miss RateL1 x Miss TimeL1)Recap: Cache Performance4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.4°Compulsory (cold start or process migration, first reference): first access to a block•“Cold” fact of life: not a whole lot you can do about it•Note: If you are going to run “billions” of instruction, Compulsory Misses are insignificant°Conflict (collision):•Multiple memory locations mappedto the same cache location•Solution 1: increase cache size•Solution 2: increase associativity°Capacity:•Cache cannot contain all blocks access by the program•Solution: increase cache size°Coherence (Invalidation): other process (e.g., I/O) updates memory Recap: A Summary on Sources of Cache Misses4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.5°The Five Classic Components of a Computer°Today’s Topics: •Recap last lecture•Virtual Memory•Protection•TLB•BusesThe Big Picture: Where are We Now? ControlDatapathMemoryProcessorInputOutput4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.6°Set of Operations that must be supported•read: data <= Mem[Physical Address]•write: Mem[Physical Address] <= Data°Determine the internal register transfers°Design the Datapath°Design the Cache ControllerPhysical AddressRead/WriteDataMemory“Black Box”Inside it has:Tag-Data Storage,Muxes,Comparators, . . .CacheControllerCacheDataPathAddressData InData OutR/WActiveControlPointsSignalswaitHow Do you Design a Memory System?4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.7Impact on Cycle TimeIRPCI -CacheD CacheA BRTIRexIRmIRwbmissinvalidMissCache Hit Time:directly tied to clock rateincreases with cache sizeincreases with associativityAverage Memory Access time = Hit Time + Miss Rate x Miss PenaltyTime = IC x CT x (ideal CPI + memory stalls)4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.8Options to reduce AMAT:1. Reduce the miss rate, 2. Reduce the miss penalty, or3. Reduce the time to hit in the cache. Average Memory Access time = Hit Time + (Miss Rate x Miss Penalty) =(Hit Rate x Hit Time) + (Miss Rate x Miss Time)Improving Cache Performance: 3 general options4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.91. Reduce the miss rate, 2. Reduce the miss penalty, or3. Reduce the time to hit in the cache. Improving Cache Performance4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.10Cache Size (KB) Miss Rate per Type00.020.040.060.080.10.120.1412481632641281-way2-way4-way8-wayCapacity Compulsory Conflict3Cs Absolute Miss Rate (SPEC92)4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.11Cache Size (KB) Miss Rate per Type00.020.040.060.080.10.120.1412481632641281-way2-way4-way8-wayCapacity Compulsory Conflict miss rate 1-way associative cache size X = miss rate 2-way associative cache size X/22:1 Cache Rule4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.12Cache Size (KB) Miss Rate per Type0%20%40%60%80%100%12481632641281-way2-way4-way8-wayCapacity Compulsory Conflict3Cs Relative Miss Rate4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.13Block Size (bytes) Miss Rate 0%5%10%15%20%25%1632641282561K4K16K64K256K1. Reduce Misses via Larger Block Size4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.14°2:1 Cache Rule: •Miss Rate DM cache size N Miss Rate 2-way cache size N/2°Beware: Execution time is only final measure!•Will Clock Cycle time increase?•Hill [1988] suggested hit time for 2-way vs. 1-way external cache +10%, internal + 2% 2. Reduce Misses via Higher Associativity4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.15°Assume CCT = 1.10 for 2-way, 1.12 for 4-way, 1.14 for 8-way vs. CCT direct mappedCache Size Associativity (KB) 1-way 2-way 4-way 8-way 1 2.33 2.15 2.07 2.01 2 1.98 1.86 1.76 1.68 4 1.72 1.67 1.61 1.53 8 1.46 1.48 1.47 1.43 16 1.29 1.32 1.32 1.32 32 1.20 1.24 1.25 1.27 64 1.14 1.20 1.21 1.23 128 1.10 1.17 1.18 1.20(Red means A.M.A.T. not improved by more associativity)Example: Avg. Memory Access Time vs. Miss Rate4/23/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec22.16To Next Lower Level
View Full Document