Slide 1Review: Why We Use CachesReview: Direct-Mapped Cache ExampleReview: Associative Cache ExampleBlock Replacement Policy (1/2)Block Replacement Policy (2/2)Block Replacement Policy: LRUBlock Replacement ExampleBlock Replacement Example: LRUBig IdeaExampleAdministriviaWays to reduce miss rateImproving Miss PenaltyAnalyzing Multi-level cache hierarchyTypical ScaleExample: with L2 cacheExample: without L2 cacheAn Actual CPU – Pentium MWhat to do on a write hit?Generalized CachingAnother View of the Memory HierarchyMemory Hierarchy RequirementsSlide 24Virtual MemoryPeer InstructionAnd in Conclusion…CS 61C L35 Caches IV / VM I (1)Garcia, Fall 2004 © UCB Andy Carle inst.eecs.berkeley.edu/~cs61c-tainst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 35Caches IV / VM I 2004-11-19 Google strikes back against recent encroachments into the Search world with the launch of two new services: Keyhole & Scholar.CS 61C L35 Caches IV / VM I (2)Garcia, Fall 2004 © UCBReview: Why We Use CachesµProc60%/yr.DRAM7%/yr.110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)Performance“Moore’s Law”•1989 first Intel CPU with cache on chip•1998 Pentium III has two levels of cache on chipCS 61C L35 Caches IV / VM I (3)Garcia, Fall 2004 © UCBReview: Direct-Mapped Cache Example•Recall this is how a simple direct mapped cache looked.MemoryMemory Address0123456789ABCDEF4 Byte Direct Mapped CacheCache Index0123CS 61C L35 Caches IV / VM I (4)Garcia, Fall 2004 © UCBReview: Associative Cache Example•Here’s a simple 2 way set associative cache.MemoryMemory Address0123456789ABCDEFCache Index0011CS 61C L35 Caches IV / VM I (5)Garcia, Fall 2004 © UCBBlock Replacement Policy (1/2)•Direct-Mapped Cache: index completely specifies position which position a block can go in on a miss•N-Way Set Assoc: index specifies a set, but block can occupy any position within the set on a miss•Fully Associative: block can be written into any position•Question: if we have the choice, where should we write an incoming block?CS 61C L35 Caches IV / VM I (6)Garcia, Fall 2004 © UCBBlock Replacement Policy (2/2)•If there are any locations with valid bit off (empty), then usually write the new block into the first one.•If all possible locations already have a valid block, we must pick a replacement policy: rule by which we determine which block gets “cached out” on a miss.CS 61C L35 Caches IV / VM I (7)Garcia, Fall 2004 © UCBBlock Replacement Policy: LRU•LRU (Least Recently Used)•Idea: cache out block which has been accessed (read or write) least recently•Pro: temporal locality recent past use implies likely future use: in fact, this is a very effective policy•Con: with 2-way set assoc, easy to keep track (one LRU bit); with 4-way or greater, requires complicated hardware and much time to keep track of thisCS 61C L35 Caches IV / VM I (8)Garcia, Fall 2004 © UCBBlock Replacement Example•We have a 2-way set associative cache with a four word total capacity and one word blocks. We perform the following word accesses (ignore bytes for this problem):0, 2, 0, 1, 4, 0, 2, 3, 5, 4How many hits and how many misses will there be for the LRU block replacement policy?CS 61C L35 Caches IV / VM I (9)Garcia, Fall 2004 © UCBBlock Replacement Example: LRU•Addresses 0, 2, 0, 1, 4, 0, ...0lru21lruloc 0 loc 1set 0set 10 2lruset 0set 1 0: miss, bring into set 0 (loc 0) 2: miss, bring into set 0 (loc 1) 0: hit 1: miss, bring into set 1 (loc 0) 4: miss, bring into set 0 (loc 1, replace 2) 0: hit0set 0set 1lrulru0 2set 0set 1lrulruset 0set 101lrulru24lruset 0set 10 41lrulrulruCS 61C L35 Caches IV / VM I (10)Garcia, Fall 2004 © UCBBig Idea•How to choose between associativity, block size, replacement policy?•Design against a performance model•Minimize: Average Memory Access Time = Hit Time + Miss Penalty x Miss Rate•influenced by technology & program behavior•Note: Hit Time encompasses Hit Rate!!!•Create the illusion of a memory that is large, cheap, and fast - on averageCS 61C L35 Caches IV / VM I (11)Garcia, Fall 2004 © UCBExample•Assume •Hit Time = 1 cycle•Miss rate = 5%•Miss penalty = 20 cycles•Calculate AMAT…•Avg mem access time = 1 + 0.05 x 20= 1 + 1 cycles= 2 cyclesCS 61C L35 Caches IV / VM I (12)Garcia, Fall 2004 © UCBAdministrivia•Do your reading! VM is hard!•Project 3 Due Next Wednesday•No Labs Next WeekCS 61C L35 Caches IV / VM I (13)Garcia, Fall 2004 © UCBWays to reduce miss rate•Larger cache•limited by cost and technology•hit time of first level cache < cycle time•More places in the cache to put each block of memory – associativity•fully-associative-any block any line•k-way set associated-k places for each block-direct map: k=1CS 61C L35 Caches IV / VM I (14)Garcia, Fall 2004 © UCBImproving Miss Penalty•When caches first became popular, Miss Penalty ~ 10 processor clock cycles•Today 2400 MHz Processor (0.4 ns per clock cycle) and 80 ns to go to DRAM 200 processor clock cycles!Proc$2DRAM$MEMSolution: another cache between memory and the processor cache: Second Level (L2) CacheCS 61C L35 Caches IV / VM I (15)Garcia, Fall 2004 © UCBAnalyzing Multi-level cache hierarchyProc$2DRAM$L1 hit timeL1 Miss RateL1 Miss PenaltyAvg Mem Access Time = L1 Hit Time + L1 Miss Rate * L1 Miss PenaltyL1 Miss Penalty = L2 Hit Time + L2 Miss Rate * L2 Miss PenaltyAvg Mem Access Time = L1 Hit Time + L1 Miss Rate * (L2 Hit Time + L2 Miss Rate * L2 Miss Penalty)L2 hit timeL2 Miss RateL2 Miss PenaltyCS 61C L35 Caches IV / VM I (16)Garcia, Fall 2004 © UCBTypical Scale•L1 •size: tens of KB•hit time: complete in one clock cycle•miss rates: 1-5%•L2:•size: hundreds of KB•hit time: few clock cycles•miss rates: 10-20%•L2 miss rate is fraction of L1 misses that also miss in L2•why so high?CS 61C L35 Caches IV / VM I (17)Garcia, Fall 2004 © UCBExample: with L2 cache•Assume •L1 Hit Time = 1 cycle•L1 Miss rate = 5%•L2 Hit Time = 5 cycles•L2 Miss rate = 15% (% L1 misses that miss)•L2 Miss Penalty = 200 cycles•L1 miss penalty = 5 + 0.15 * 200 = 35•Avg mem access time = 1 + 0.05 x 35= 2.75 cyclesCS 61C L35 Caches IV / VM I (18)Garcia, Fall 2004 © UCBExample: without L2 cache•Assume •L1 Hit Time = 1 cycle•L1 Miss rate = 5%•L1 Miss Penalty = 200 cycles•Avg mem access time = 1 + 0.05 x
View Full Document