CS152 Computer Architecture and Engineering Lecture 20 CachesRecap: The Big Picture: Where are We Now?The Art of Memory System DesignRecap: Cache PerformanceExample: 1 KB Direct Mapped Cache with 32 B BlocksSet Associative CacheDisadvantage of Set Associative CacheExample: Fully AssociativeA Summary on Sources of Cache MissesDesign options at constant costFour Questions for Caches and Memory HierarchyQ1: Where can a block be placed in the upper level?Q2: How is a block found if it is in the upper level?Q3: Which block should be replaced on a miss?Q4: What happens on a write?New Question: How does a store to the cache work?Write Buffer for Write ThroughWrite-miss Policy: Write Allocate versus Not AllocateAdministrative IssuesAdministrivia: Edge Detection For Lab 5How Do you Design a Memory System?Review: Stall Methodology in Memory StageImpact on Cycle TimeImproving Cache Performance: 3 general optionsImproving Cache Performance3Cs Absolute Miss Rate (SPEC92)2:1 Cache Rule3Cs Relative Miss Rate1. Reduce Misses via Larger Block Size2. Reduce Misses via Higher AssociativityExample: Avg. Memory Access Time vs. Miss Rate3. Reducing Misses via a “Victim Cache”4. Reducing Misses by Hardware Prefetching5. Reducing Misses by Software Prefetching Data6. Reducing Misses by Compiler OptimizationsImproving Cache Performance (Continued)0. Reducing Penalty: Faster DRAM / Interface1. Reducing Penalty: Read Priority over Write on MissWrite Buffer SaturationRAW Hazards from Write Buffer!2. Reduce Penalty: Early Restart and Critical Word First3. Reduce Penalty: Non-blocking CachesReprise: What happens on a Cache miss?Value of Hit Under Miss for SPEC4. Reduce Penalty: Second-Level CacheReducing Misses: which apply to L2 Cache?L2 cache block size & A.M.A.T.Slide 48Example: Harvard ArchitectureSummary #1/ 2:Summary #2 / 2: The Cache Design SpaceCS152Computer Architecture and EngineeringLecture 20CachesApril 14, 2003John Kubiatowicz (www.cs.berkeley.edu/~kubitron)lecture slides: http://inst.eecs.berkeley.edu/~cs152/4/14/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec20.2°The Five Classic Components of a Computer°Today’s Topics: •Recap last lecture•Simple caching techniques•Many ways to improve cache performance•Virtual memory?Recap: The Big Picture: Where are We Now? ControlDatapathMemoryProcessorInputOutput4/14/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec20.3Processor$MEMMemoryreference stream <op,addr>, <op,addr>,<op,addr>,<op,addr>, . . .op: i-fetch, read, writeOptimize the memory system organizationto minimize the average memory access timefor typical workloadsWorkload orBenchmarkprogramsThe Art of Memory System Design4/14/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec20.4Execution_Time = Instruction_Count x Cycle_Time x (ideal CPI + Memory_Stalls/Inst + Other_Stalls/Inst)Memory_Stalls/Inst = Instruction Miss Rate x Instruction Miss Penalty +Loads/Inst x Load Miss Rate x Load Miss Penalty +Stores/Inst x Store Miss Rate x Store Miss PenaltyAverage Memory Access time (AMAT) = Hit TimeL1 + (Miss RateL1 x Miss PenaltyL1) =(Hit RateL1 x Hit TimeL1) + (Miss RateL1 x Miss TimeL1)Recap: Cache Performance4/14/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec20.5Example: 1 KB Direct Mapped Cache with 32 B Blocks°For a 2 ** N byte cache:•The uppermost (32 - N) bits are always the Cache Tag•The lowest M bits are the Byte Select (Block Size = 2M)•One cache miss, pull in complete “Cache Block” (or “Cache Line”)Cache Index0123: Cache DataByte 00431:Cache Tag Example: 0x50Ex: 0x010x50Stored as partof the cache “state”Valid Bit:31Byte 1Byte 31:Byte 32Byte 33Byte 63:Byte 992Byte 1023: Cache TagByte SelectEx: 0x009Block address4/14/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec20.6Set Associative Cache°N-way set associative: N entries for each Cache Index•N direct mapped caches operates in parallel°Example: Two-way set associative cache•Cache Index selects a “set” from the cache•The two tags in the set are compared to the input in parallel•Data is selected based on the tag resultCache DataCache Block 0Cache TagValid:: :Cache DataCache Block 0Cache Tag Valid: ::Cache IndexMux01Sel1 Sel0Cache BlockCompareAdr TagCompareORHit4/14/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec20.7Disadvantage of Set Associative Cache°N-way Set Associative Cache versus Direct Mapped Cache:•N comparators vs. 1•Extra MUX delay for the data•Data comes AFTER Hit/Miss decision and set selection°In a direct mapped cache, Cache Block is available BEFORE Hit/Miss:•Possible to assume a hit and continue. Recover later if miss.Cache DataCache Block 0Cache Tag Valid: ::Cache DataCache Block 0Cache TagValid:: :Cache IndexMux01Sel1 Sel0Cache BlockCompareAdr TagCompareORHit4/14/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec20.8Example: Fully Associative°Fully Associative Cache•Forget about the Cache Index•Compare the Cache Tags of all cache entries in parallel•Example: Block Size = 32 B blocks, we need N 27-bit comparators°By definition: Conflict Miss = 0 for a fully associative cache: Cache DataByte 00431:Cache Tag (27 bits long)Valid Bit:Byte 1Byte 31:Byte 32Byte 33Byte 63: Cache TagByte SelectEx: 0x01=====4/14/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec20.9°Compulsory (cold start or process migration, first reference): first access to a block•“Cold” fact of life: not a whole lot you can do about it•Note: If you are going to run “billions” of instruction, Compulsory Misses are insignificant°Capacity:•Cache cannot contain all blocks access by the program•Solution: increase cache size°Conflict (collision):•Multiple memory locations mappedto the same cache location•Solution 1: increase cache size•Solution 2: increase associativity°Coherence (Invalidation): other process (e.g., I/O) updates memory A Summary on Sources of Cache Misses4/14/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec20.10Design options at constant costDirect Mapped N-way Set Associative Fully AssociativeCompulsory MissCache SizeCapacity MissCoherence MissBig Medium SmallNote:If you are going to run “billions” of instruction, Compulsory Misses are insignificant (except for streaming media types of programs).SameSame SameConflict Miss High Medium ZeroLow Medium HighSame Same Same4/14/04 ©UCB Spring 2004CS152 / Kubiatowicz Lec20.11°Q1: Where can a block be placed in the upper level? (Block placement)°Q2: How is a block found if it is in the upper level? (Block identification)°Q3:
View Full Document