DOC PREVIEW
Berkeley COMPSCI 252 - Caches I: 3 Cs and 7 ways to reduce misses

This preview shows page 1-2-3-24-25-26-27-49-50-51 out of 51 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS252 Graduate Computer Architecture Lecture 15 Caches I: 3 Cs and 7 ways to reduce missesReview: Genetic Programming for DesignReview: Who Cares About the Memory Hierarchy?Review: Memory DisambiguationReview: STORE setsProcessor-Memory Performance Gap “Tax”What is a cache?Example: 1 KB Direct Mapped CacheSet Associative CacheDisadvantage of Set Associative CacheGenerations of MicroprocessorsWhat happens on a Cache miss?Review: Cache PerformanceImpact on PerformanceReview: Four Questions for Memory Hierarchy DesignersReview: Improving Cache PerformanceReducing Misses3Cs Absolute Miss Rate (SPEC92)2:1 Cache Rule3Cs Relative Miss RateHow Can Reduce Misses?1. Reduce Misses via Larger Block Size2. Reduce Misses via Higher AssociativityExample: Avg. Memory Access Time vs. Miss Rate3. Reducing Misses via a “Victim Cache”4. Reducing Misses via “Pseudo-Associativity”CS 252 Administrivia5. Reducing Misses by Hardware Prefetching of Instructions & Datals6. Reducing Misses by Software Prefetching Data7. Reducing Misses by Compiler OptimizationsMerging Arrays ExampleLoop Interchange ExampleLoop Fusion ExampleBlocking ExampleSlide 35Reducing Conflict Misses by BlockingSummary of Compiler Optimizations to Reduce Cache Misses (by hand)SummarySlide 391. Reducing Miss Penalty: Read Priority over Write on Miss2. Reduce Miss Penalty: Subblock Placement3. Reduce Miss Penalty: Early Restart and Critical Word First4. Reduce Miss Penalty: Non-blocking Caches to reduce stalls on missesValue of Hit Under Miss for SPEC5th Miss PenaltyComparing Local and Global Miss RatesReducing Misses: Which apply to L2 Cache?L2 cache block size & A.M.A.T.Reducing Miss Penalty SummaryWhat is the Impact of What You’ve Learned About Caches?Cache Optimization SummaryCS252/KubiatowiczLec 15.110/25/00CS252Graduate Computer ArchitectureLecture 15Caches I: 3 Cs and 7 ways to reduce missesOctober 25, 1999Prof. John KubiatowiczCS252/KubiatowiczLec 15.210/25/00Review: Genetic Programming for Design•Genetic programming has two key aspects:–An Encoding of the design space.»This is a symbolic representation of the result space (genome).»Much of the domain-specific knowledge and “art” involved here.–A Reproduction strategy»Includes a method for generating offspring from parentsMutation: Changing random portions of an individualCrossover: Merging aspects of two individuals»Includes a method for evaluating the effectiveness (“fitness”) of individual solutions.•Generation of new branch predictors via genetic programming:–Everything derived from a “basic” predictor (table) + simple operators.–Expressions arranged in a tree–Mutation: random modification of node/replacement of subtree–Crossover: swapping the subtrees of two parents.CS252/KubiatowiczLec 15.310/25/00•Processor Only Thus Far in Course:–CPU cost/performance, ISA, Pipelined Execution CPU-DRAM Gap•1980: no cache in µproc; 1995 2-level cache on chip(1989 first Intel µproc with a cache on chip)Review: Who Cares About the Memory Hierarchy?µProc60%/yr.DRAM7%/yr.110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)Performance“Moore’s Law”“Less’ Law?”CS252/KubiatowiczLec 15.410/25/00Review: Memory Disambiguation•Memory disambiguation buffer contains set of active stores and loads in program order.–Loads and stores are entered at issue time–May not have addresses yet•Optimistic dependence speculation: assume that loads and stores don’t depend on each other•Need disambiguation buffer to catch errors.All checks occur at address resolution time:–When store address is ready, check for loads that are (1) later in time and (2) have same address. »These have been incorrectly speculated: flush and restart–When load address is ready, check for stores that are (1) earlier in time and (2) have same addressif (match) thenif (store value ready) then return value else return pointer to reservation stationelseoptimistically start load accessCS252/KubiatowiczLec 15.510/25/00Review: STORE sets•Naïve speculation can cause problems for certain load-store pairs.•“Counter-Speculation”:For each load, keep track of set of stores that have forwarded information in past.–If (prior store in store-set has unresolved address) thenwait for store address to be completed else if (match) thenif (store value ready) then return value else return pointer to reservation stationelseoptimistically start load accessIndexSSIDStore InumLoad/Store PCStore Set ID Table(SSIT)Last Fetched Store Table(LFST)CS252/KubiatowiczLec 15.610/25/00Processor-Memory Performance Gap “Tax” Processor % Area %Transistors ( cost) ( power)•Alpha 21164 37% 77%•StrongArm SA110 61% 94%•Pentium Pro 64% 88%–2 dies per package: Proc/I$/D$ + L2$•Caches have no inherent value, only try to close performance gapCS252/KubiatowiczLec 15.710/25/00What is a cache?•Small, fast storage used to improve average access time to slow memory.•Exploits spacial and temporal locality•In computer architecture, almost everything is a cache!–Registers a cache on variables–First-level cache a cache on second-level cache–Second-level cache a cache on memory–Memory a cache on disk (virtual memory)–TLB a cache on page table–Branch-prediction a cache on prediction information?Proc/RegsL1-CacheL2-CacheMemoryDisk, Tape, etc.Bigger FasterCS252/KubiatowiczLec 15.810/25/00Example: 1 KB Direct Mapped Cache•For a 2 ** N byte cache:–The uppermost (32 - N) bits are always the Cache Tag–The lowest M bits are the Byte Select (Block Size = 2 ** M)Cache Index0123: Cache DataByte 00431:Cache Tag Example: 0x50Ex: 0x010x50Stored as partof the cache “state”Valid Bit:31Byte 1Byte 31:Byte 32Byte 33Byte 63:Byte 992Byte 1023: Cache TagByte SelectEx: 0x009Block addressCS252/KubiatowiczLec 15.910/25/00Set Associative Cache•N-way set associative: N entries for each Cache Index–N direct mapped caches operates in parallel•Example: Two-way set associative cache–Cache Index selects a “set” from the cache–The two tags in the set are compared to the input in parallel–Data is selected based on the tag resultCache DataCache Block 0Cache TagValid:: :Cache DataCache Block 0Cache Tag Valid: ::Cache IndexMux01Sel1 Sel0Cache BlockCompareAdr TagCompareORHitCS252/KubiatowiczLec 15.1010/25/00Disadvantage of Set Associative Cache•N-way Set Associative Cache


View Full Document

Berkeley COMPSCI 252 - Caches I: 3 Cs and 7 ways to reduce misses

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Caches I: 3 Cs and 7 ways to reduce misses
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Caches I: 3 Cs and 7 ways to reduce misses and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Caches I: 3 Cs and 7 ways to reduce misses 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?