CS 61C L33 Caches III (1) Wawrzynek Spring 2006 © UCB4/14/2006John Wawrzynek(www.cs.berkeley.edu/~johnw)www-inst.eecs.berkeley.edu/~cs61c/CS61C – Machine StructuresLecture 33 - Caches IIICS 61C L33 Caches III (2) Wawrzynek Spring 2006 © UCBReview: Why We Use CachesµProc60%/yr.DRAM7%/yr.110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)Performance“Moore’s Law”° 1989 first Intel CPU with cache on chip° 1998 Pentium III has two levels of cache on chipCS 61C L33 Caches III (3) Wawrzynek Spring 2006 © UCBFully Associative Cache (2/3)°Fully Associative Cache (e.g., 32 B block)• compare tags in parallelByte Offset: Cache DataB 00431:Cache Tag (27 bits long)Valid:B 1B 31: Cache Tag=====:CS 61C L33 Caches III (4) Wawrzynek Spring 2006 © UCBFully Associative Cache (3/3)°Benefit of Fully Assoc Cache• No Conflict Misses (since data can goanywhere)°Drawbacks of Fully Assoc Cache• Need hardware comparator for everysingle entry: if we have a 64KB of data incache with 4B entries, we need 16Kcomparators: very expensive!- Alternatively, use fewer comparisons, butcompare sequentially - too slow!CS 61C L33 Caches III (5) Wawrzynek Spring 2006 © UCBThird Type of Cache Miss°Capacity Misses• miss that occurs because the cache hasa limited size• miss that would not occur if we increasethe size of the cache• sketchy definition, so just get the generalidea°This is the primary type of miss forFully Associative caches.CS 61C L33 Caches III (6) Wawrzynek Spring 2006 © UCBN-Way Set Associative Cache (1/4)°Memory address fields:• Tag: same as before• Offset: same as before• Index: points us to the correct “row”(called a set in this case)°So what’s the difference?• each set contains multiple blocks• once we’ve found correct set, mustcompare with all tags in that set to findour dataCS 61C L33 Caches III (7) Wawrzynek Spring 2006 © UCBSet Associative Cache ExampleHere’s a simple 2 way setassociative cache.MemoryMemory Address0123456789ABCDEFCache Index0011CS 61C L33 Caches III (8) Wawrzynek Spring 2006 © UCBN-Way Set Associative Cache (2/4)°Summary:• cache is direct-mapped w/respect to sets• each set is fully associative• basically N direct-mapped cachesworking in parallel: each has its ownvalid bit and dataCS 61C L33 Caches III (9) Wawrzynek Spring 2006 © UCBN-Way Set Associative Cache (3/4)°Given memory address:• Find correct set using Index value.• Compare Tag with all Tag values in thedetermined set.• If a match occurs, hit!, otherwise a miss.• Finally, use the offset field as usual tofind the desired data within the block.CS 61C L33 Caches III (10) Wawrzynek Spring 2006 © UCBN-Way Set Associative Cache (4/4)°What’s so great about this?• even a 2-way set assoc cache avoids alot of conflict misses• hardware cost isn’t that bad: only need Ncomparators°In fact, for a cache with M blocks,• it’s Direct-Mapped if it’s 1-way set assoc• it’s Fully Assoc if it’s M-way set assoc• so these two are just special cases of themore general set associative designCS 61C L33 Caches III (11) Wawrzynek Spring 2006 © UCB4-Way Set Associative Cache CircuittagindexCS 61C L33 Caches III (12) Wawrzynek Spring 2006 © UCBBlock Replacement Policy (2/2)°If there are any locations with valid bitoff (empty), then usually write the newblock into the first one.°If all possible locations already have avalid block, we must pick areplacement policy: rule by which wedetermine which block gets “cachedout” on a miss.CS 61C L33 Caches III (13) Wawrzynek Spring 2006 © UCBBlock Replacement Policy: LRU°LRU (Least Recently Used)• Idea: cache out block which has beenaccessed (read or write) least recently• Pro: temporal locality ⇒ recent past useimplies likely future use: in fact, this is avery effective policy• Con: with 2-way set assoc, easy to keeptrack (one LRU bit); with 4-way orgreater, requires complicated hardwareand much time to keep track of thisCS 61C L33 Caches III (14) Wawrzynek Spring 2006 © UCBBlock Replacement Example°We have a 2-way set associative cachewith a four word total capacity and oneword blocks. We perform thefollowing word accesses (ignore bytesfor this problem):0, 2, 0, 1, 4, 0, 2, 3, 5, 4How many hits and how many misseswill there be for the LRU blockreplacement policy?CS 61C L33 Caches III (15) Wawrzynek Spring 2006 © UCBBlock Replacement Example: LRU°Addresses 0, 2, 0, 1, 4, 0, ...0lru21lruloc 0 loc 1set 0set 10 2lruset 0set 1 0: miss, bring into set 0 (loc 0) 2: miss, bring into set 0 (loc 1) 0: hit 1: miss, bring into set 1 (loc 0) 4: miss, bring into set 0 (loc 1, replace 2) 0: hit0set 0set 1lrulru0 2set 0set 1lrulruset 0set 101lrulru24lruset 0set 10 41lrulrulruCS 61C L33 Caches III (16) Wawrzynek Spring 2006 © UCBBig Idea°How to choose between associativity,block size, replacement policy?°Design against a performance model• Minimize: Average Memory Access Time = Hit Time + Miss Penalty x Miss Rate• influenced by technology & programbehavior°Create the illusion of a memory that islarge, cheap, and fast - on averageCS 61C L33 Caches III (17) Wawrzynek Spring 2006 © UCBExample°Assume• Hit Time = 1 cycle• Miss rate = 5%• Miss penalty = 20 cycles• Calculate AMAT…°Avg mem access time= 1 + 0.05 x 20= 1 + 1 cycles= 2 cyclesCS 61C L33 Caches III (18) Wawrzynek Spring 2006 © UCBAdministrivia°Do your reading! VM is coming up,and it’s shown to be hard for students!°Project 5 out°Exam• Wed 4/19, 1 Pimentel 7-9pm• Covers weeks 6-12 (focus on lecturematerial)• TA Review Monday eveningCS 61C L33 Caches III (19) Wawrzynek Spring 2006 © UCBWays to reduce miss rate°Larger cache• limited by cost and technology• hit time of first level cache < cycle time°More places in the cache to put eachblock of memory – associativity• fully-associative- any block any line• N-way set associated- N places for each block- direct map: N=1CS 61C L33 Caches III (20) Wawrzynek Spring 2006 © UCBImproving Miss Penalty°When caches first became popular, MissPenalty ~ 10 processor clock cycles°Today 2400 MHz Processor (0.4 ns perclock cycle) and 80 ns to go to DRAM⇒ 200 processor clock cycles!Proc$2DRAM$MEMSolution: another cache between memory andthe processor cache: Second Level (L2) CacheCS 61C L33 Caches III (21) Wawrzynek Spring 2006 © UCBAnalyzing
View Full Document