DOC PREVIEW
Berkeley COMPSCI 152 - Memory Hierarchy-I

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

February 9, 2011 CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!http://inst.eecs.berkeley.edu/~cs152!February 9, 2011 CS152, Spring 2011 2 Last time in Lecture 6 • Dynamic RAM (DRAM) is main form of main memory storage in use today – Holds values on small capacitors, need refreshing (hence dynamic) – Slow multi-step access: precharge, read row, read column • Static RAM (SRAM) is faster but more expensive – Used to build on-chip memory for caches • Cache holds small set of values in fast memory (SRAM) close to processor – Need to develop search scheme to find values in cache, and replacement policy to make space for newly accessed locations • Caches exploit two forms of predictability in memory reference streams – Temporal locality, same location likely to be accessed again soon – Spatial locality, neighboring location likely to be accessed soonFebruary 9, 2011 CS152, Spring 2011 3 CPU-Cache Interaction (5-stage pipeline) PC addr inst Primary Instruction Cache 0x4 Add IR D nop hit? PCen Decode, Register Fetch wdata R addr wdata rdata Primary Data Cache we A B Y Y ALU MD1 MD2 Cache Refill Data from Lower Levels of Memory Hierarchy hit? Stall entire CPU on data cache miss To Memory Control M EFebruary 9, 2011 CS152, Spring 2011 4 Improving Cache Performance Average memory access time = Hit time + Miss rate x Miss penalty To improve performance: • reduce the hit time • reduce the miss rate • reduce the miss penalty What is the simplest design strategy? Biggest cache that doesn’t increase hit time past 1-2 cycles (approx 8-32KB in modern technology) [ design issues more complex with out-of-order superscalar processors ]February 9, 2011 CS152, Spring 2011 Serial-versus-Parallel Cache and Memory access α is HIT RATIO: Fraction of references in cache 1 - α is MISS RATIO: Remaining references CACHE Processor Main Memory Addr Addr Data Data Average access time for serial search: tcache + (1 - α) tmem CACHE Processor Main Memory Addr Data Data Average access time for parallel search: α tcache + (1 - α) tmem • Savings are usually small, tmem >> tcache, hit ratio α high • High bandwidth required for memory path • Complexity of handling parallel paths can slow tcacheFebruary 9, 2011 CS152, Spring 2011 6 Causes for Cache Misses • Compulsory: first-reference to a block a.k.a. cold start misses - misses that would occur even with infinite cache • Capacity: cache is too small to hold all data needed by the program - misses that would occur even under perfect replacement policy • Conflict: misses that occur because of collisions due to block-placement strategy - misses that would not occur with full associativityFebruary 9, 2011 CS152, Spring 2011 7 Effect of Cache Parameters on Performance • Larger cache size + reduces capacity and conflict misses - hit time will increase • Higher associativity + reduces conflict misses - may increase hit time • Larger block size + reduces compulsory and capacity (reload) misses - increases conflict misses and miss penaltyFebruary 9, 2011 CS152, Spring 2011 8 Write Policy Choices • Cache hit: – write through: write both cache & memory » Generally higher traffic but simpler pipeline & cache design – write back: write cache only, memory is written only when the entry is evicted » A dirty bit per block further reduces write-back traffic » Must handle 0, 1, or 2 accesses to memory for each load/store • Cache miss: – no write allocate: only write to main memory – write allocate (aka fetch on write): fetch into cache • Common combinations: – write through and no write allocate – write back with write allocateFebruary 9, 2011 CS152, Spring 2011 9 Write Performance Tag Data V = Block Offset Tag Index t k b t HIT Data Word or Byte 2k lines WEFebruary 9, 2011 CS152, Spring 2011 10 Reducing Write Hit Time Problem: Writes take two cycles in memory stage, one cycle for tag check plus one cycle for data write if hit Solutions: • Design data RAM that can perform read and write in one cycle, restore old value after tag miss • Fully-associative (CAM Tag) caches: Word line only enabled if hit • Pipelined writes: Hold write data for store in single buffer ahead of cache, write cache data during next store’s tag checkFebruary 9, 2011 CS152, Spring 2011 11 Pipelining Cache Writes Tags Data Tag Index Store Data Address and Store Data From CPU Delayed Write Data Delayed Write Addr. =? =? Load Data to CPU Load/Store L S 1 0 Hit? Data from a store hit written into data portion of cache during tag access of subsequent storeFebruary 9, 2011 CS152, Spring 2011 12 Write Buffer to Reduce Read Miss Penalty Processor is not stalled on writes, and read misses can go ahead of write to main memory Problem: Write buffer may hold updated value of location needed by a read miss Simple scheme: on a read miss, wait for the write buffer to go empty Faster scheme: Check write buffer addresses against read miss addresses, if no match, allow read miss to go ahead of writes, else, return value in write buffer Data Cache Unified L2 Cache RF CPU Write buffer Evicted dirty lines for writeback cache OR All writes in writethrough cacheFebruary 9, 2011 CS152, Spring 2011 13 Block-level Optimizations • Tags are too large, i.e., too much overhead – Simple solution: Larger blocks, but miss penalty could be large. • Sub-block placement (aka sector cache) – A valid bit added to units smaller than full block, called sub-blocks – Only read a sub-block on a miss – If a tag matches, is the word in the cache? 100 300 204 1 1 1 1 1 1 0 0 0 1 0 1February 9, 2011 CS152, Spring 2011 14 Multilevel Caches Problem: A memory cannot be large and fast Solution: Increasing sizes of cache at each level CPU L1$ L2$ DRAM Local miss rate = misses in cache / accesses to cache Global miss rate = misses in cache / CPU memory accesses Misses per instruction = misses in cache / number of instructionsFebruary 9, 2011 CS152, Spring 2011


View Full Document

Berkeley COMPSCI 152 - Memory Hierarchy-I

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Memory Hierarchy-I
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Memory Hierarchy-I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Memory Hierarchy-I 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?