DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 22 Advanced Caching

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.1CS152Computer Architecture and EngineeringLecture 22Advanced CachingApril 23, 2003John Kubiatowicz (www.cs.berkeley.edu/~kubitron)lecture slides: http://inst.eecs.berkeley.edu/~cs152/4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.2Recap: Set Associative Cache° N-way set associative: N entries for each Cache Index• N direct mapped caches operates in parallel° Example: Two-way set associative cache• Cache Index selects a “set” from the cache• The two tags in the set are compared to the input in parallel• Data is selected based on the tag resultCache DataCache Block 0Cache TagValid:::Cache DataCache Block 0Cache Tag Valid:::Cache IndexMux01Sel1 Sel0Cache BlockCompareAdr TagCompareORHit4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.3Execution_Time =Instruction_Count x Cycle_Time x (ideal CPI + Memory_Stalls/Inst + Other_Stalls/Inst)Memory_Stalls/Inst =Instruction Miss Rate x Instruction Miss Penalty +Loads/Inst x Load Miss Rate x Load Miss Penalty +Stores/Inst x Store Miss Rate x Store Miss PenaltyAverage Memory Access time (AMAT)=Hit TimeL1+ (Miss RateL1x Miss PenaltyL1) =(Hit RateL1x Hit TimeL1) + (Miss RateL1x Miss TimeL1)Recap: Cache Performance4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.4° Compulsory (cold start or process migration, first reference): first access to a block• “Cold” fact of life: not a whole lot you can do about it• Note: If you are going to run “billions” of instruction, Compulsory Misses are insignificant° Conflict (collision):• Multiple memory locations mappedto the same cache location• Solution 1: increase cache size• Solution 2: increase associativity° Capacity:• Cache cannot contain all blocks access by the program• Solution: increase cache size° Coherence (Invalidation): other process (e.g., I/O) updates memory Recap: A Summary on Sources of Cache Misses4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.5° The Five Classic Components of a Computer° Today’s Topics: • Recap last lecture• Virtual Memory• Protection• TLB• BusesThe Big Picture: Where are We Now? ControlDatapathMemoryProcessorInputOutput4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.6° Set of Operations that must be supported• read: data <= Mem[Physical Address]• write: Mem[Physical Address] <= Data° Determine the internal register transfers° Design the Datapath° Design the Cache ControllerPhysical AddressRead/WriteDataMemory“Black Box”Inside it has:Tag-Data Storage,Muxes,Comparators, . . .CacheControllerCacheDataPathAddressData InData OutR/WActiveControlPointsSignalswaitHow Do you Design a Memory System?4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.7Impact on Cycle TimeIRPCI -CacheD CacheABRTIRexIRmIRwbmissinvalidMissCache Hit Time:directly tied to clock rateincreases with cache sizeincreases with associativityAverage Memory Access time = Hit Time+ Miss Rate x Miss PenaltyTime = IC x CT x (ideal CPI + memory stalls)4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.8Options to reduce AMAT:1. Reduce the miss rate, 2. Reduce the miss penalty, or3. Reduce the time to hit in the cache. Average Memory Access time = Hit Time + (Miss Rate x Miss Penalty) =(Hit Rate x Hit Time) + (Miss Rate x Miss Time)Improving Cache Performance: 3 general options4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.91. Reduce the miss rate,2. Reduce the miss penalty,or3. Reduce the time to hit in the cache. Improving Cache Performance4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.10Cache Size (KB) 00.020.040.060.080.10.120.1412481632641281-way2-way4-way8-wayCapacity Compulsory Conflict3Cs Absolute Miss Rate (SPEC92)4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.11Cache Size (KB) 00.020.040.060.080.10.120.1412481632641281-way2-way4-way8-wayCapacity Compulsory Conflictmiss rate 1-way associative cache size X = miss rate 2-way associative cache size X/22:1 Cache Rule4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.12Cache Size (KB) 0%20%40%60%80%100%12481632641281-way2-way4-way8-wayCapacity Compulsory Conflict3Cs Relative Miss Rate4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.13Block Size (bytes) Miss Rate 0%5%10%15%20%25%1632641282561K4K16K64K256K1. Reduce Misses via Larger Block Size4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.14° 2:1 Cache Rule:• Miss Rate DM cache size N - Miss Rate 2-way cache size N/2° Beware: Execution time is only final measure!• Will Clock Cycle time increase?• Hill [1988] suggested hit time for 2-way vs. 1-way external cache +10%, internal + 2% 2. Reduce Misses via Higher Associativity4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.15° Assume CCT = 1.10 for 2-way, 1.12 for 4-way, 1.14 for 8-way vs. CCT direct mappedCache Size Associativity(KB) 1-way 2-way 4-way 8-way1 2.33 2.15 2.07 2.012 1.98 1.86 1.76 1.684 1.72 1.67 1.61 1.5381.461.48 1.47 1.4316 1.29 1.32 1.32 1.3232 1.20 1.24 1.25 1.2764 1.14 1.20 1.21 1.23128 1.10 1.17 1.18 1.20(Redmeans A.M.A.T. not improved by more associativity)Example: Avg. Memory Access Time vs. Miss Rate4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.16To Next Lower Level InHierarchyDATATAGSOne Cache line of DataTag and ComparatorOne Cache line of DataTag and ComparatorOne Cache line of DataTag and ComparatorOne Cache line of DataTag and Comparator3. Reducing Misses via a “Victim Cache”° How to combine fast hit time of direct mapped yet still avoid conflict misses? ° Add buffer to place data discarded from cache° Jouppi [1990]: 4-entry victim cache removed 20% to 95% of conflicts for a 4 KB direct mapped data cache° Used in Alpha, HP machines4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.17° E.g., Instruction Prefetching• Alpha 21064 fetches 2 blocks on a miss• Extra block placed in “stream buffer”• On miss check stream buffer° Works with data blocks too:• Jouppi [1990] 1 data stream buffer got 25% misses from 4KB cache; 4 streams got 43%• Palacharla & Kessler [1994] for scientific programs for 8 streams got 50% to 70% of misses from 2 64KB, 4-way set associative caches° Prefetching relies on having extra memory bandwidth that can be used without penalty• Could reduce performance if done indiscriminantly!!!4. Reducing Misses by Hardware Prefetching4/23/03©UCB Spring 2003CS152 / Kubiatowicz Lec22.18° Data Prefetch• Load data into register (HP PA-RISC loads)• Cache Prefetch: load into cache


View Full Document

Berkeley COMPSCI 152 - Lecture 22 Advanced Caching

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 22 Advanced Caching
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 22 Advanced Caching and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 22 Advanced Caching 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?