U of U CS 6810 - Lecture 13 - Cache Hierarchies

Unformatted text preview:

1Lecture 13: Cache Hierarchies• Today: cache access basics and innovations(Sections 5.1-5.2)2The Cache HierarchyCore L1L2L3Off-chip memory3Accessing the Cache8-byte words101000Direct-mapped cache:each address maps toa unique address8 words: 3 index bitsByte addressData arraySetsOffset4The Tag Array8-byte words101000Direct-mapped cache:each address maps toa unique addressByte addressTagCompareData arrayTag array5Increasing Line Size32-byte cacheline size or block size10100000Byte addressTagData arrayTag arrayOffsetA large cache line size  smaller tag array,fewer misses because of spatial locality6Associativity10100000Byte addressTagData arrayTag arraySet associativity  fewer conflicts; wasted powerbecause multiple data and tags are readWay-1 Way-2Compare7Example• 32 KB 4-way set-associative data cache array with 32byte line sizes• How many sets?• How many index bits, offset bits, tag bits?• How large is the tag array?8Types of Cache Misses• Compulsory misses: happens the first time a memoryword is accessed – the misses for an infinite cache• Capacity misses: happens because the program touchedmany other words before re-touching the same word – themisses for a fully-associative cache• Conflict misses: happens because two words map to thesame location in the cache – the misses generated whilemoving from a fully-associative to a direct-mapped cache• Sidenote: can a fully-associative cache have more missesthan a direct-mapped cache of the same size?9What Influences Cache Misses?Compulsory Capacity ConflictIncreasing cache capacityIncreasing number of setsIncreasing block sizeIncreasing associativity10Reducing Miss Rate• Large block size – reduces compulsory misses, reducesmiss penalty in case of spatial locality – increases trafficbetween different levels, space wastage, and conflict misses• Large caches – reduces capacity/conflict misses – accesstime penalty• High associativity – reduces conflict misses – rule of thumb:2-way cache of capacity N/2 has the same miss rate as1-way cache of capacity N – more energy• Way prediction – by predicting the way, the access timeis effectively like a direct-mapped cache – can also reducepower consumption11Cache Misses• On a write miss, you may either choose to bring the blockinto the cache (write-allocate) or not (write-no-allocate)• On a read miss, you always bring the block in (spatial andtemporal locality) – but which block do you replace? no choice for a direct-mapped cache randomly pick one of the ways to replace replace the way that was least-recently used (LRU) FIFO replacement (round-robin)12Writes• When you write into a block, do you also update thecopy in L2? write-through: every write to L1  write to L2 write-back: mark the block as dirty, when the blockgets replaced from L1, write it to L2• Writeback coalesces multiple writes to an L1 block into oneL2 write• Writethrough simplifies coherency protocols in amultiprocessor system as the L2 always has a currentcopy of data13Reducing Cache Miss Penalty• Multi-level caches• Critical word first• Priority for reads• Victim caches14Multi-Level Caches• The L2 and L3 have properties that are different from L1 access time is not as critical for L2 as it is for L1 (everyload/store/instruction accesses the L1) the L2 is much larger and can consume more powerper access• Hence, they can adopt alternative design choices serial tag and data access high associativity15Read/Write Priority• For writeback/thru caches, writes to lower levels are placedin write buffers• When we have a read miss, we must look up the writebuffer before checking the lower level• When we have a write miss, the write can merge withanother entry in the write buffer or it creates a new entry• Reads are more urgent than writes (probability of an instrwaiting for the result of a read is 100%, while probability ofan instr waiting for the result of a write is much smaller) –hence, reads get priority unless the write buffer is full16Victim Caches• A direct-mapped cache suffers from misses becausemultiple pieces of data map to the same location• The processor often tries to access data that it recentlydiscarded – all discards are placed in a small victim cache(4 or 8 entries) – the victim cache is checked before goingto L2• Can be viewed as additional associativity for a few setsthat tend to have the most conflicts17Tolerating Miss Penalty• Out of order execution: can do other useful work whilewaiting for the miss – can have multiple cache misses-- cache controller has to keep track of multipleoutstanding misses (non-blocking cache)• Hardware and software prefetching into prefetch buffers – aggressive prefetching can increase contention for buses18Title•


View Full Document

U of U CS 6810 - Lecture 13 - Cache Hierarchies

Documents in this Course
Caches

Caches

13 pages

Pipelines

Pipelines

14 pages

Load more
Download Lecture 13 - Cache Hierarchies
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 13 - Cache Hierarchies and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 13 - Cache Hierarchies 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?