Berkeley COMPSCI 252 - Caches and Memory Systems - D1934260

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 252> Caches and Memory Systems

DOC PREVIEW

Berkeley COMPSCI 252 - Caches and Memory Systems

School name University of California, Berkeley

Course Compsci 252- Graduate Computer Architecture

Pages 56

This preview shows page 1-2-3-4-26-27-28-53-54-55-56 out of 56 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 56 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 56 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 56 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 56 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 56 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 56 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 56 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 56 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 56 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 56 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 56 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 56 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS252 Graduate Computer Architecture Lecture 4 Caches and Memory SystemsReview: Who Cares About the Memory Hierarchy?Review: Cache performanceReview: Reducing MissesReview: Miss Rate ReductionImproving Cache Performance ContinuedWhat happens on a Cache miss?Write Policy: Write-Through vs Write-BackWrite Policy 2: Write Allocate vs Non-Allocate (What happens on write-miss)1. Reducing Miss Penalty: Read Priority over Write on MissSlide 112. Reduce Miss Penalty: Early Restart and Critical Word First3. Reduce Miss Penalty: Non-blocking Caches to reduce stalls on missesValue of Hit Under Miss for SPEC (Normalized to blocking cache)4. Second level cacheComparing Local and Global Miss RatesReducing Misses: Which apply to L2 Cache?L2 cache block size & A.M.A.T.Reducing Miss Penalty SummaryMain Memory BackgroundMain Memory Deep BackgroundDRAM logical organization (4 Mbit)4 Key DRAM Timing ParametersDRAM Read TimingDRAM PerformanceDRAM HistoryDRAM Future: 1 Gbit DRAM (ISSCC ‘96; production ‘02?)Fast Memory Systems: DRAM specificMain Memory OrganizationsMain Memory PerformanceIndependent Memory BanksSlide 32Avoiding Bank ConflictsFast Bank NumberDRAMs per PC over TimeNeed for Error Correction!Architecture in practiceMore esoteric Storage Technologies?Tunneling Magnetic JunctionMEMS-based StorageMain Memory SummaryReview: Improving Cache Performance1. Fast Hit times via Small and Simple Caches2. Fast hits by Avoiding Address Translation2. Fast hits by Avoiding Address Translation2. Fast Cache Hits by Avoiding Translation: Process ID impact2. Fast Cache Hits by Avoiding Translation: Index with Physical Portion of Address3: Fast Hits by pipelining Cache Case Study: MIPS R4000Case Study: MIPS R4000R4000 PerformanceWhat is the Impact of What You’ve Learned About Caches?Alpha 21064Alpha Memory Performance: Miss Rates of SPEC92Alpha CPI ComponentsPitfall: Predicting Cache Performance from Different Prog. (ISA, compiler, ...)Cache Optimization SummaryCS252/KubiatowiczLec 4.11/26/01CS252Graduate Computer ArchitectureLecture 4Caches and Memory SystemsJanuary 26, 2001Prof. John KubiatowiczCS252/KubiatowiczLec 4.21/26/01Review: Who Cares About the Memory Hierarchy?µProc60%/yr.DRAM7%/yr.110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)Performance“Moore’s Law”•Processor Only Thus Far in Course:–CPU cost/performance, ISA, Pipelined Execution CPU-DRAM Gap•1980: no cache in µproc; 1995 2-level cache on chip(1989 first Intel µproc with a cache on chip)“Less’ Law?”CS252/KubiatowiczLec 4.31/26/01•Miss-oriented Approach to Memory Access:–CPIExecution includes ALU and Memory instructionsCycleTimeyMissPenaltMissRateInstMemAc cessExecutionCPIICCPUtime CycleTimeyMissPenaltInstMemMissesExecutionCPIICCPUtime Review: Cache performance•Separating out Memory component entirely–AMAT = Average Memory Access Time–CPIALUOps does not include memory instructionsCycleTimeAMATInstMemAccessCPIInstAluO psICCPUtimeAluOpsyMissPenaltMissRateHitTimeAMAT   DataDataDataInstInstInstyMissPenaltMissRateHitTimeyMissPenaltMissRateHitTimeCS252/KubiatowiczLec 4.41/26/01Review: Reducing Misses•Classifying Misses: 3 Cs–Compulsory—Misses in even an Infinite Cache–Capacity—Misses in Fully Associative Size X Cache–Conflict—Misses in N-way Associative, Size X Cache•More recent, 4th “C”:–Coherence - Misses caused by cache coherence.CS252/KubiatowiczLec 4.51/26/01Review: Miss Rate Reduction•3 Cs: Compulsory, Capacity, Conflict1. Reduce Misses via Larger Block Size2. Reduce Misses via Higher Associativity3. Reducing Misses via Victim Cache4. Reducing Misses via Pseudo-Associativity5. Reducing Misses by HW Prefetching Instr, Data6. Reducing Misses by SW Prefetching Data7. Reducing Misses by Compiler Optimizations•Prefetching comes in two flavors:–Binding prefetch: Requests load directly into register.»Must be correct address and register!–Non-Binding prefetch: Load into cache. »Can be incorrect. Frees HW/SW to guess!yMissPenaltMissRateHitTimeAMAT CS252/KubiatowiczLec 4.61/26/01Improving Cache PerformanceContinued1. Reduce the miss rate, 2. Reduce the miss penalty, or3. Reduce the time to hit in the cache. yMissPenaltMissRateHitTimeAMAT CS252/KubiatowiczLec 4.71/26/01What happens on a Cache miss?•For in-order pipeline, 2 options:–Freeze pipeline in Mem stage (popular early on: Sparc, R4000)IF ID EX Mem stall stall stall … stall Mem Wr IF ID EX stall stall stall … stall stall Ex Wr–Use Full/Empty bits in registers + MSHR queue»MSHR = “Miss Status/Handler Registers” (Kroft)Each entry in this queue keeps track of status of outstanding memory requests to one complete memory line.•Per cache-line: keep info about memory address.•For each word: register (if any) that is waiting for result.•Used to “merge” multiple requests to one memory line»New load creates MSHR entry and sets destination register to “Empty”. Load is “released” from pipeline.»Attempt to use register before result returns causes instruction to block in decode stage.»Limited “out-of-order” execution with respect to loads. Popular with in-order superscalar architectures.•Out-of-order pipelines already have this functionality built in… (load queues, etc).CS252/KubiatowiczLec 4.81/26/01Write Policy:Write-Through vs Write-Back•Write-through: all writes update cache and underlying memory/cache–Can always discard cached data - most up-to-date data is in memory–Cache control bit: only a valid bit•Write-back: all writes simply update cache–Can’t just discard cached data - may have to write it back to memory–Cache control bits: both valid and dirty bits•Other Advantages:–Write-through:»memory (or other processors) always have latest data»Simpler management of cache–Write-back:»much lower bandwidth, since data often overwritten multiple times»Better tolerance to long-latency memory?CS252/KubiatowiczLec 4.91/26/01Write Policy 2:Write Allocate vs Non-Allocate(What happens on write-miss)•Write allocate: allocate new cache line in cache–Usually means that you have to do a “read miss” to fill in rest of the cache-line!–Alternative: per/word valid bits•Write non-allocate (or

View Full Document