CS252 Graduate Computer Architecture Lecture 17 Memory Systems ContinuedReview: Who Cares About the Memory Hierarchy?Review: Cache performanceSummary: Miss Rate ReductionImproving Cache Performance ContinuedWrite Policy: Write-Through vs Write-Back1. Reducing Miss Penalty: Read Priority over Write on MissSlide 82. Reduce Miss Penalty: Subblock Placement3. Reduce Miss Penalty: Early Restart and Critical Word First4. Reduce Miss Penalty: Non-blocking Caches to reduce stalls on missesValue of Hit Under Miss for SPEC5. Second level cacheComparing Local and Global Miss RatesReducing Misses: Which apply to L2 Cache?L2 cache block size & A.M.A.T.Reducing Miss Penalty SummaryAdministrativeMain Memory BackgroundMain Memory Deep BackgroundDRAM logical organization (4 Mbit)4 Key DRAM Timing ParametersDRAM PerformanceDRAM HistoryDRAM Future: 1 Gbit DRAM (ISSCC ‘96; production ‘02?)More esoteric Storage Technologies?Tunneling Magnetic JunctionMEMS-based StorageMain Memory PerformanceSlide 30Independent Memory BanksSlide 32DRAMs per PC over TimeAvoiding Bank ConflictsFast Bank NumberFast Memory Systems: DRAM specificDRAM Latency >> BWPotential DRAM Crossroads?Main Memory SummaryReview: Improving Cache Performance1. Fast Hit times via Small and Simple Caches2. Fast hits by Avoiding Address Translation2. Fast hits by Avoiding Address Translation2. Fast Cache Hits by Avoiding Translation: Process ID impact2. Fast Cache Hits by Avoiding Translation: Index with Physical Portion of Address3. Fast Hit Times Via Pipelined Writes4. Fast Writes on Misses Via Small SubblocksCache Optimization SummaryWhat is the Impact of What You’ve Learned About Caches?Cache Cross Cutting IssuesAlpha 21064Alpha Memory Performance: Miss Rates of SPEC92Alpha CPI ComponentsPitfall: Predicting Cache Performance from Different Prog. (ISA, compiler, ...)Pitfall: Simulating Too Small an Address TraceSlide 56Slide 57Next Time: ECC/ErrorsCS252/KubiatowiczLec 17.111/1/00CS252Graduate Computer ArchitectureLecture 17Memory SystemsContinuedNovember 1, 2000Computer Science 252CS252/KubiatowiczLec 17.211/1/00Review: Who Cares About the Memory Hierarchy?µProc60%/yr.DRAM7%/yr.110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)Performance“Moore’s Law”•Processor Only Thus Far in Course:–CPU cost/performance, ISA, Pipelined Execution CPU-DRAM Gap•1980: no cache in µproc; 1995 2-level cache on chip(1989 first Intel µproc with a cache on chip)“Less’ Law?”CS252/KubiatowiczLec 17.311/1/00Review: Cache performanceCycleTimeyMissPenaltMiss RateInstMemAccessCPIICCPUtimeExecution•Miss-oriented Approach to Memory Access:•Separating out Memory component entirely–AMAT = Average Memory Access TimeCycleTimeAMATIns tMemAccessCPIICCPUtimeAluOpsyMissPenaltMissRateHitTimeAMAT DataDataDataInstInstInstyMissPenaltMissRateHitTimeyMissPenaltMissRateHitTimeCS252/KubiatowiczLec 17.411/1/00Summary: Miss Rate Reduction•3 Cs: Compulsory, Capacity, Conflict1. Reduce Misses via Larger Block Size2. Reduce Misses via Higher Associativity3. Reducing Misses via Victim Cache4. Reducing Misses via Pseudo-Associativity5. Reducing Misses by HW Prefetching Instr, Data6. Reducing Misses by SW Prefetching Data7. Reducing Misses by Compiler Optimizations•Prefetching comes in two flavors:–Binding prefetch: Requests load directly into register.»Must be correct address and register!–Non-Binding prefetch: Load into cache. »Can be incorrect. Frees HW/SW to guess!CPUtime IC CPIExecutionMemory accessesInstructionMiss rate Miss penaltyClock cycle timeCS252/KubiatowiczLec 17.511/1/00Improving Cache PerformanceContinued1. Reduce the miss rate, 2. Reduce the miss penalty, or3. Reduce the time to hit in the cache.CS252/KubiatowiczLec 17.611/1/00Write Policy:Write-Through vs Write-Back•Write-through: all writes update cache and underlying memory/cache–Can always discard cached data - most up-to-date data is in memory–Cache control bit: only a valid bit•Write-back: all writes simply update cache–Can’t just discard cached data - may have to write it back to memory–Cache control bits: both valid and dirty bits•Other Advantages:–Write-through:»memory (or other processors) always have latest data»Simpler management of cache–Write-back:»much lower bandwidth, since data often overwritten multiple times»Better tolerance to long-latency memory?CS252/KubiatowiczLec 17.711/1/001. Reducing Miss Penalty: Read Priority over Write on MisswritebufferCPUin out DRAM (or lower mem)Write BufferCS252/KubiatowiczLec 17.811/1/001. Reducing Miss Penalty: Read Priority over Write on Miss•Write-through with write buffers offer RAW conflicts with main memory reads on cache misses–If simply wait for write buffer to empty, might increase read miss penalty (old MIPS 1000 by 50% )–Check write buffer contents before read; if no conflicts, let the memory access continue•Write-back also want buffer to hold misplaced blocks–Read miss replacing dirty block–Normal: Write dirty block to memory, and then do the read–Instead copy the dirty block to a write buffer, then do the read, and then do the write–CPU stall less since restarts as soon as do readCS252/KubiatowiczLec 17.911/1/002. Reduce Miss Penalty: Subblock Placement•Don’t have to load full block on a miss•Have valid bits per subblock to indicate valid•(Originally invented to reduce tag storage)Valid BitsSubblocksCS252/KubiatowiczLec 17.1011/1/003. Reduce Miss Penalty: Early Restart and Critical Word First•Don’t wait for full block to be loaded before restarting CPU–Early restart—As soon as the requested word of the block arrives, send it to the CPU and let the CPU continue execution–Critical Word First—Request the missed word first from memory and send it to the CPU as soon as it arrives; let the CPU continue execution while filling the rest of the words in the block. Also called wrapped fetch and requested word first•Generally useful only in large blocks, •Spatial locality a problem; tend to want next sequential word, so not clear if benefit by early restartblockCS252/KubiatowiczLec 17.1111/1/004. Reduce Miss Penalty: Non-blocking Caches to reduce stalls on misses•Non-blocking cache or lockup-free
View Full Document