DOC PREVIEW
Berkeley COMPSCI 252 - Lecture 4 Cache Design

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS252 Graduate Computer Architecture Lecture 4 Cache DesignWho Cares About the Memory Hierarchy?Generations of MicroprocessorsProcessor-Memory Performance Gap “Tax”What is a cache?Traditional Four Questions for Memory Hierarchy DesignersWhat are all the aspects of cache organization that impact performance?Review: Cache performanceImpact on PerformanceUnified vs Split CachesHow to Improve Cache Performance?Where to misses come from?3Cs Absolute Miss Rate (SPEC92)Cache SizeHuge Caches => Working SetsCache Organization?Larger Block Size (fixed size&assoc)Associativity3Cs Relative Miss RateAssociativity vs Cycle TimeExample: Avg. Memory Access Time vs. Miss RateFast Hit Time + Low Conflict => Victim CacheReducing Misses via “Pseudo-Associativity”Reducing Misses by Hardware Prefetching of Instructions & DataReducing Misses by Software Prefetching DataReducing Misses by Compiler OptimizationsMerging Arrays ExampleLoop Interchange ExampleLoop Fusion ExampleBlocking ExampleSlide 31Reducing Conflict Misses by BlockingSummary of Compiler Optimizations to Reduce Cache Misses (by hand)Summary: Miss Rate ReductionReview: Improving Cache PerformanceWrite Policy: Write-Through vs Write-BackWrite Policy 2: Write Allocate vs Non-Allocate (What happens on write-miss)1. Reducing Miss Penalty: Read Priority over Write on MissSlide 392. Reduce Miss Penalty: Early Restart and Critical Word First3. Reduce Miss Penalty: Non-blocking Caches to reduce stalls on missesValue of Hit Under Miss for SPEC4: Add a second-level cacheComparing Local and Global Miss RatesReducing Misses: Which apply to L2 Cache?L2 cache block size & A.M.A.T.Reducing Miss Penalty SummaryWhat is the Impact of What You’ve Learned About Caches?Cache Optimization SummaryCS252/CullerLec 4.11/31/02CS252Graduate Computer ArchitectureLecture 4Cache DesignJanuary 31, 2002Prof. David CullerCS252/CullerLec 4.21/31/02 CPU-DRAM Gap•1980: no cache in µproc; 1995 2-level cache on chip(1989 first Intel µproc with a cache on chip)Who Cares About the Memory Hierarchy?µProc60%/yr.DRAM7%/yr.110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)Performance“Moore’s Law”“Less’ Law?”CS252/CullerLec 4.31/31/02 Generations of Microprocessors •Time of a full cache miss in instructions executed:1st Alpha: 340 ns/5.0 ns = =68 clks x 2 or 1362nd Alpha: 266 ns/3.3 ns = =80 clks x 4 or 3203rd Alpha: 180 ns/1.7 ns =108 clks x 6 or 648•1/2X latency x 3X clock rate x 3X Instr/clock  5XCS252/CullerLec 4.41/31/02Processor-Memory Performance Gap “Tax” Processor % Area %Transistors (cost) (power)•Alpha 21164 37% 77%•StrongArm SA110 61% 94%•Pentium Pro 64% 88%–2 dies per package: Proc/I$/D$ + L2$•Caches have no “inherent value”, only try to close performance gapCS252/CullerLec 4.51/31/02What is a cache?•Small, fast storage used to improve average access time to slow memory.•Exploits spacial and temporal locality•In computer architecture, almost everything is a cache!–Registers “a cache” on variables – software managed–First-level cache a cache on second-level cache–Second-level cache a cache on memory–Memory a cache on disk (virtual memory)–TLB a cache on page table–Branch-prediction a cache on prediction information?Proc/RegsL1-CacheL2-CacheMemoryDisk, Tape, etc.Bigger FasterCS252/CullerLec 4.61/31/02Traditional Four Questions for Memory Hierarchy Designers•Q1: Where can a block be placed in the upper level? (Blockplacement)–Fully Associative, Set Associative, Direct Mapped•Q2: How is a block found if it is in the upper level?(Blockidentification)–Tag/Block•Q3: Which block should be replaced on a miss? (Blockreplacement)–Random, LRU•Q4: What happens on a write? (Writestrategy)–Write Back or Write Through (with Write Buffer)CS252/CullerLec 4.71/31/02What are all the aspects of cache organization that impact performance?CS252/CullerLec 4.81/31/02•Miss-oriented Approach to Memory Access:–CPIExecution includes ALU and Memory instructionsCycleTimeyMissPenaltMissRateInstMemAc cessExecutionCPIICCPUtime CycleTimeyMissPenaltInstMemMissesExecutionCPIICCPUtime Review: Cache performance•Separating out Memory component entirely–AMAT = Average Memory Access Time–CPIALUOps does not include memory instructionsCycleTimeAMATInstMemAccessCPIInstAluOpsICCPUtimeAluOpsyMissPenaltMissRateHitTimeAMAT   DataDataDataInstInstInstyMissPenaltMissRateHitTimeyMissPenaltMissRateHitTimeCS252/CullerLec 4.91/31/02Impact on Performance•Suppose a processor executes at –Clock Rate = 200 MHz (5 ns per cycle), Ideal (no misses) CPI = 1.1 –50% arith/logic, 30% ld/st, 20% control•Suppose that 10% of memory operations get 50 cycle miss penalty•Suppose that 1% of instructions get same miss penalty•CPI = ideal CPI + average stalls per instruction1.1(cycles/ins) +[ 0.30 (DataMops/ins) x 0.10 (miss/DataMop) x 50 (cycle/miss)] +[ 1 (InstMop/ins) x 0.01 (miss/InstMop) x 50 (cycle/miss)] = (1.1 + 1.5 + .5) cycle/ins = 3.1 •58% of the time the proc is stalled waiting for memory!•AMAT=(1/1.3)x[1+0.01x50]+(0.3/1.3)x[1+0.1x50]=2.54CS252/CullerLec 4.101/31/02Unified vs Split Caches•Unifie d vs Sepa rate I&D•Example :–16KB I&D: Inst miss rate=0.64%, Data miss rate=6.47%–32KB unified: Aggregate miss rate=1.99%•Wh ich is bett er (ignore L2 cache )?–Assume 33% data ops  75% accesses from instructions (1.0/1.33)–hit time=1, miss time=50–Note that data hit has 1 stall for unified cache (only one port)AMATHarvard=75%x(1+0.64%x50)+25%x (1+6.47%x50) = 2.05AMATUnified=75%x(1+1.99%x50)+25%x (1+1+1.99%x50)= 2.24ProcI-Cache-1ProcUnifiedCache-1UnifiedCache-2D-Cache-1ProcUnifiedCache-2CS252/CullerLec 4.111/31/02How to Improve Cache Performance?1.Reducethemissrate,2. Reduce the miss penalty, or3. Reduce the time to hit in the cache. yMissPenaltMissRateHitTim eA MAT CS252/CullerLec 4.121/31/02Where to misses come from?•Classifying Misses: 3 Cs–Compulsory—The first access to a block is not in the cache, so the block must be brought into the cache. Also called coldstartmisses or firstreferencemisses.(MissesinevenanInfiniteCache)–Capacity—If the cache cannot contain all the


View Full Document

Berkeley COMPSCI 252 - Lecture 4 Cache Design

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Lecture 4 Cache Design
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 4 Cache Design and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 4 Cache Design 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?