U of U CS 6810 - Lecture 14 - Cache Innovations and DRAM

Unformatted text preview:

1Lecture 14: Cache Innovations and DRAM• Today: cache access basics and innovations, DRAM(Sections 5.1-5.3)2Reducing Miss Rate• Large block size – reduces compulsory misses, reducesmiss penalty in case of spatial locality – increases trafficbetween different levels, space wastage, and conflict misses• Large caches – reduces capacity/conflict misses – accesstime penalty• High associativity – reduces conflict misses – rule of thumb:2-way cache of capacity N/2 has the same miss rate as1-way cache of capacity N – access time penalty• Way prediction – by predicting the way, can reducepower consumption3Cache Misses• On a write miss, you may either choose to bring the blockinto the cache (write-allocate) or not (write-no-allocate)• On a read miss, you always bring the block in (spatial andtemporal locality) – but which block do you replace? no choice for a direct-mapped cache randomly pick one of the ways to replace replace the way that was least-recently used (LRU) FIFO replacement (round-robin)4Writes• When you write into a block, do you also update thecopy in L2? write-through: every write to L1  write to L2 write-back: mark the block as dirty, when the blockgets replaced from L1, write it to L2• Writeback coalesces multiple writes to an L1 block into oneL2 write• Writethrough simplifies coherency protocols in amultiprocessor system as the L2 always has a currentcopy of data5Reducing Cache Miss Penalty• Multi-level caches• Critical word first• Priority for reads• Victim caches6Multi-Level Caches• The L2 and L3 have properties that are different from L1 access time is not as critical for L2 as it is for L1 (everyload/store/instruction accesses the L1) the L2 is much larger and can consume more powerper access• Hence, they can adopt alternative design choices serial tag and data access high associativity7Read/Write Priority• For writeback/thru caches, writes to lower levels are placedin write buffers• When we have a read miss, we must look up the writebuffer before checking the lower level• When we have a write miss, the write can merge withanother entry in the write buffer or it creates a new entry• Reads are more urgent than writes (probability of an instrwaiting for the result of a read is 100%, while probability ofan instr waiting for the result of a write is much smaller) –hence, reads get priority unless the write buffer is full8Victim Caches• A direct-mapped cache suffers from misses becausemultiple pieces of data map to the same location• The processor often tries to access data that it recentlydiscarded – all discards are placed in a small victim cache(4 or 8 entries) – the victim cache is checked before goingto L2• Can be viewed as additional associativity for a few setsthat tend to have the most conflicts9Tolerating Miss Penalty• Out of order execution: can do other useful work whilewaiting for the miss – can have multiple cache misses-- cache controller has to keep track of multipleoutstanding misses (non-blocking cache)• Hardware and software prefetching into prefetch buffers – aggressive prefetching can increase contention for buses10DRAM Main Memory• Main memory is stored in DRAM cells that have muchhigher storage density• DRAM cells lose their state over time – must be refreshedperiodically, hence the name Dynamic• DRAM access suffers from long access time and highenergy overhead• Since the pins on a processor chip are expected to notincrease much, we will hit a memory bandwidth wall11DRAM Organization11…Memory bus or channelRankDRAMchip ordeviceBankArray1/8thof therow bufferOne word ofdata outputDIMMOn-chip Memory Controller12DRAM Array Access1M DRAM = 1024 x 1024 array of bits10 row address bitsarrive firstColumn decoder10 column address bitsarrive nextSubset of bitsreturned to CPU1024 bitsare read outRow Access Strobe (RAS)Column Access Strobe (CAS) Row Buffer13Salient Points• DIMM, rank, bank, array  form a hierarchy in thestorage organization; banks can be simultaneouslyworking on different requests• A cache line is spread across several DRAM chips toincrease data transfer bandwidth• To maximize density, arrays are made large  rows arewide  row buffers are wide (8KB read for a 64B request)• The memory controller schedules memory accesses tomaximize row buffer hit rates and bank parallelism14Technology Trends• Improvements in technology (smaller devices)  DRAMcapacities double every two years• Will soon hit a density wall; may have to be replaced byother technologies (phase change memory, STT-RAM)• Interconnects may have to be photonic to overcome thebandwidth limitation imposed by pins on the chip15Title•


View Full Document

U of U CS 6810 - Lecture 14 - Cache Innovations and DRAM

Documents in this Course
Caches

Caches

13 pages

Pipelines

Pipelines

14 pages

Load more
Download Lecture 14 - Cache Innovations and DRAM
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 14 - Cache Innovations and DRAM and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 14 - Cache Innovations and DRAM 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?