DOC PREVIEW
Berkeley COMPSCI 252 - Memory Systems

This preview shows page 1-2-3-4-27-28-29-30-55-56-57-58 out of 58 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 58 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 58 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 58 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 58 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 58 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 58 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 58 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 58 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 58 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 58 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 58 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 58 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 58 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS252 Graduate Computer Architecture Lecture 17 Memory Systems ContinuedReview: Who Cares About the Memory Hierarchy?Review: Cache performanceSummary: Miss Rate ReductionImproving Cache Performance ContinuedWrite Policy: Write-Through vs Write-Back1. Reducing Miss Penalty: Read Priority over Write on MissSlide 82. Reduce Miss Penalty: Subblock Placement3. Reduce Miss Penalty: Early Restart and Critical Word First4. Reduce Miss Penalty: Non-blocking Caches to reduce stalls on missesValue of Hit Under Miss for SPEC5. Second level cacheComparing Local and Global Miss RatesReducing Misses: Which apply to L2 Cache?L2 cache block size & A.M.A.T.Reducing Miss Penalty SummaryAdministrativeMain Memory BackgroundMain Memory Deep BackgroundDRAM logical organization (4 Mbit)4 Key DRAM Timing ParametersDRAM PerformanceDRAM HistoryDRAM Future: 1 Gbit DRAM (ISSCC ‘96; production ‘02?)More esoteric Storage Technologies?Tunneling Magnetic JunctionMEMS-based StorageMain Memory PerformanceSlide 30Independent Memory BanksSlide 32DRAMs per PC over TimeAvoiding Bank ConflictsFast Bank NumberFast Memory Systems: DRAM specificDRAM Latency >> BWPotential DRAM Crossroads?Main Memory SummaryReview: Improving Cache Performance1. Fast Hit times via Small and Simple Caches2. Fast hits by Avoiding Address Translation2. Fast hits by Avoiding Address Translation2. Fast Cache Hits by Avoiding Translation: Process ID impact2. Fast Cache Hits by Avoiding Translation: Index with Physical Portion of Address3. Fast Hit Times Via Pipelined Writes4. Fast Writes on Misses Via Small SubblocksCache Optimization SummaryWhat is the Impact of What You’ve Learned About Caches?Cache Cross Cutting IssuesAlpha 21064Alpha Memory Performance: Miss Rates of SPEC92Alpha CPI ComponentsPitfall: Predicting Cache Performance from Different Prog. (ISA, compiler, ...)Pitfall: Simulating Too Small an Address TraceSlide 56Slide 57Next Time: ECC/ErrorsCS252/KubiatowiczLec 17.111/1/00CS252Graduate Computer ArchitectureLecture 17Memory SystemsContinuedNovember 1, 2000Computer Science 252CS252/KubiatowiczLec 17.211/1/00Review: Who Cares About the Memory Hierarchy?µProc60%/yr.DRAM7%/yr.110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)Performance“Moore’s Law”•Processor Only Thus Far in Course:–CPU cost/performance, ISA, Pipelined Execution CPU-DRAM Gap•1980: no cache in µproc; 1995 2-level cache on chip(1989 first Intel µproc with a cache on chip)“Less’ Law?”CS252/KubiatowiczLec 17.311/1/00Review: Cache performanceCycleTimeyMissPenaltMiss RateInstMemAccessCPIICCPUtimeExecution•Miss-oriented Approach to Memory Access:•Separating out Memory component entirely–AMAT = Average Memory Access TimeCycleTimeAMATIns tMemAccessCPIICCPUtimeAluOpsyMissPenaltMissRateHitTimeAMAT   DataDataDataInstInstInstyMissPenaltMissRateHitTimeyMissPenaltMissRateHitTimeCS252/KubiatowiczLec 17.411/1/00Summary: Miss Rate Reduction•3 Cs: Compulsory, Capacity, Conflict1. Reduce Misses via Larger Block Size2. Reduce Misses via Higher Associativity3. Reducing Misses via Victim Cache4. Reducing Misses via Pseudo-Associativity5. Reducing Misses by HW Prefetching Instr, Data6. Reducing Misses by SW Prefetching Data7. Reducing Misses by Compiler Optimizations•Prefetching comes in two flavors:–Binding prefetch: Requests load directly into register.»Must be correct address and register!–Non-Binding prefetch: Load into cache. »Can be incorrect. Frees HW/SW to guess!CPUtime IC  CPIExecutionMemory accessesInstructionMiss rate Miss penaltyClock cycle timeCS252/KubiatowiczLec 17.511/1/00Improving Cache PerformanceContinued1. Reduce the miss rate, 2. Reduce the miss penalty, or3. Reduce the time to hit in the cache.CS252/KubiatowiczLec 17.611/1/00Write Policy:Write-Through vs Write-Back•Write-through: all writes update cache and underlying memory/cache–Can always discard cached data - most up-to-date data is in memory–Cache control bit: only a valid bit•Write-back: all writes simply update cache–Can’t just discard cached data - may have to write it back to memory–Cache control bits: both valid and dirty bits•Other Advantages:–Write-through:»memory (or other processors) always have latest data»Simpler management of cache–Write-back:»much lower bandwidth, since data often overwritten multiple times»Better tolerance to long-latency memory?CS252/KubiatowiczLec 17.711/1/001. Reducing Miss Penalty: Read Priority over Write on MisswritebufferCPUin out DRAM (or lower mem)Write BufferCS252/KubiatowiczLec 17.811/1/001. Reducing Miss Penalty: Read Priority over Write on Miss•Write-through with write buffers offer RAW conflicts with main memory reads on cache misses–If simply wait for write buffer to empty, might increase read miss penalty (old MIPS 1000 by 50% )–Check write buffer contents before read; if no conflicts, let the memory access continue•Write-back also want buffer to hold misplaced blocks–Read miss replacing dirty block–Normal: Write dirty block to memory, and then do the read–Instead copy the dirty block to a write buffer, then do the read, and then do the write–CPU stall less since restarts as soon as do readCS252/KubiatowiczLec 17.911/1/002. Reduce Miss Penalty: Subblock Placement•Don’t have to load full block on a miss•Have valid bits per subblock to indicate valid•(Originally invented to reduce tag storage)Valid BitsSubblocksCS252/KubiatowiczLec 17.1011/1/003. Reduce Miss Penalty: Early Restart and Critical Word First•Don’t wait for full block to be loaded before restarting CPU–Early restart—As soon as the requested word of the block arrives, send it to the CPU and let the CPU continue execution–Critical Word First—Request the missed word first from memory and send it to the CPU as soon as it arrives; let the CPU continue execution while filling the rest of the words in the block. Also called wrapped fetch and requested word first•Generally useful only in large blocks, •Spatial locality a problem; tend to want next sequential word, so not clear if benefit by early restartblockCS252/KubiatowiczLec 17.1111/1/004. Reduce Miss Penalty: Non-blocking Caches to reduce stalls on misses•Non-blocking cache or lockup-free


View Full Document

Berkeley COMPSCI 252 - Memory Systems

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Memory Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Memory Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Memory Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?