DOC PREVIEW
Berkeley COMPSCI 252 - Caches I

This preview shows page 1-2-3-22-23-24-44-45-46 out of 46 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 12: Caches IReview: Genetic Programming for DesignReview: Who Cares About the Memory Hierarchy?Processor-Memory Performance Gap “Tax”What is a cache?Generations of MicroprocessorsWhat happens on a Cache miss?Review: Four Questions for Memory Hierarchy DesignersReview: Cache PerformanceSlide 10Review: Improving Cache PerformanceReducing Misses3Cs Absolute Miss Rate (SPEC92)2:1 Cache Rule3Cs Relative Miss RateHow Can Reduce Misses?1. Reduce Misses via Larger Block Size2. Reduce Misses via Higher AssociativityExample: Avg. Memory Access Time vs. Miss Rate3. Reducing Misses via a “Victim Cache”4. Reducing Misses via “Pseudo-Associativity”CS 252 Administrivia5. Reducing Misses by Hardware Prefetching of Instructions & Datals6. Reducing Misses by Software Prefetching Data7. Reducing Misses by Compiler OptimizationsMerging Arrays ExampleLoop Interchange ExampleLoop Fusion ExampleBlocking ExampleSlide 30Reducing Conflict Misses by BlockingSummary of Compiler Optimizations to Reduce Cache Misses (by hand)SummarySlide 341. Reducing Miss Penalty: Read Priority over Write on Miss2. Reduce Miss Penalty: Subblock Placement3. Reduce Miss Penalty: Early Restart and Critical Word First4. Reduce Miss Penalty: Non-blocking Caches to reduce stalls on missesValue of Hit Under Miss for SPEC5th Miss PenaltyComparing Local and Global Miss RatesReducing Misses: Which apply to L2 Cache?L2 cache block size & A.M.A.T.Reducing Miss Penalty SummaryWhat is the Impact of What You’ve Learned About Caches?Cache Optimization SummaryJDK.F98 Slide 1Lecture 12: Caches IProf. John KubiatowiczComputer Science 252Fall 1998JDK.F98 Slide 2Review: Genetic Programming for Design•Genetic programming has two key aspects:–An Encoding of the design space.»This is a symbolic representation of the result space (genome).»Much of the domain-specific knowledge and “art” involved here.–A Reproduction strategy»Includes a method for generating offspring from parentsMutation: Changing random portions of an individualCrossover: Merging aspects of two individuals»Includes a method for evaluating the effectiveness (“fitness”) of individual solutions.•Generation of new branch predictors via genetic programming:–Everything derived from a “basic” predictor (table) + simple operators.–Expressions arranged in a tree–Mutation: random modification of node/replacement of subtree–Crossover: swapping the subtrees of two parents.JDK.F98 Slide 3Review: Who Cares About the Memory Hierarchy?µProc60%/yr.DRAM7%/yr.110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)Performance“Moore’s Law”•Processor Only Thus Far in Course:–CPU cost/performance, ISA, Pipelined Execution CPU-DRAM Gap•1980: no cache in µproc; 1995 2-level cache on chip(1989 first Intel µproc with a cache on chip)JDK.F98 Slide 4Processor-Memory Performance Gap “Tax” Processor % Area %Transistors ( cost) ( power)•Alpha 21164 37% 77%•StrongArm SA110 61% 94%•Pentium Pro 64% 88%–2 dies per package: Proc/I$/D$ + L2$•Caches have no inherent value, only try to close performance gapJDK.F98 Slide 5What is a cache?•Small, fast storage used to improve average access time to slow memory.•Exploits spacial and temporal locality•In computer architecture, almost everything is a cache!–Registers a cache on variables–First-level cache a cache on second-level cache–Second-level cache a cache on memory–Memory a cache on disk (virtual memory)–TLB a cache on page table–Branch-prediction a cache on prediction information?Proc/RegsL1-CacheL2-CacheMemoryDisk, Tape, etc.Bigger FasterJDK.F98 Slide 6 Generations of Microprocessors •Time of a full cache miss in instructions executed:1st Alpha (7000): 340 ns/5.0 ns = K68 clks x 2 or 1362nd Alpha (8400): 266 ns/3.3 ns = K80 clks x 4 or 3203rd Alpha (t.b.d.): 180 ns/1.7 ns =108 clks x 6 or 648•1/2X latency x 3X clock rate x 3X Instr/clock  5XJDK.F98 Slide 7What happens on a Cache miss?•For in-order pipeline, 2 options:–Freeze pipeline in Mem stage (popular early on: Sparc, R4000)IF ID EX Mem stall stall stall … stall Mem Wr IF ID EX stall stall stall … stall stall Ex Wr–Use Full/Empty bits in registers + MSHR queue»MSHR = “Miss Status/Handler Registers” (Kroft)Each entry in this queue keeps track of status of outstanding memory requests to one complete memory line.•Per cache-line: keep info about memory address.•For each word: register (if any) that is waiting for result.•Used to “merge” multiple requests to one memory line»New load creates MSHR entry and sets destination register to “Empty”. Load is “released” from pipeline.»Attempt to use register before result returns causes instruction to block in decode stage.»Limited “out-of-order” execution with respect to loads. Popular with in-order superscalar architectures.•Out-of-order pipelines already have this functionality built in… (load queues, etc).JDK.F98 Slide 8Review: Four Questions for Memory Hierarchy Designers•Q1: Where can a block be placed in the upper level? (Block placement)–Fully Associative, Set Associative, Direct Mapped•Q2: How is a block found if it is in the upper level? (Block identification)–Tag/Block•Q3: Which block should be replaced on a miss? (Block replacement)–Random, LRU•Q4: What happens on a write? (Write strategy)–Write Back or Write Through (with Write Buffer)JDK.F98 Slide 9Review: Cache PerformanceCPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle timeMemory stall clock cycles = (Reads x Read miss rate x Read miss penalty + Writes x Write miss rate x Write miss penalty)Memory stall clock cycles = Memory accesses x Miss rate x Miss penaltyNote: memory hit time is included in execution cycles.JDK.F98 Slide 10Review: Cache PerformanceCPUtime = Instruction Count x (CPIexecution + Mem accesses/inst x Miss rate x Miss penalty) x Clock cycle timeMisses per instruction = Memory accesses/inst x Miss rateCPUtime = IC x (CPIexecution + Misses per instruction x Miss penalty) x Clock cycle timeJDK.F98 Slide 11Review: Improving Cache Performance1. Reduce the miss rate, 2. Reduce the miss penalty, or3. Reduce the time to hit in the cache.JDK.F98 Slide 12Reducing Misses•Classifying Misses: 3 Cs–Compulsory—The first access to a block is not in the cache, so the block must be brought


View Full Document

Berkeley COMPSCI 252 - Caches I

Documents in this Course
Quiz

Quiz

9 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Caches I
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Caches I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Caches I 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?