Unformatted text preview:

Page 1 1 CS6810 School of Computing University of Utah Caches Today’s topics: Basics memory hierarchy locality cache models associative options calculating miss penalties some fundamental optimization issues 2 CS6810 School of Computing University of Utah The Problem • Widening memory gap  DRAM latency CAGR = 7%  CPU performance CAGR » 25% prior to 1986, 52% 86-2005, 20% thereafterPage 2 3 CS6810 School of Computing University of Utah The Solution • Deepen memory hierarchy  make dependency on DRAM latency the rare case » if you can multi-ported single ported 4 CS6810 School of Computing University of Utah Balancing Act • As always  performance vs. cost, capacity vs. latency, …  on-die SRAM is expensive per bit » latency depends on size and organization » area – 6T/bit » power – not so bad since only small portion active/cycle  DRAM » latency is awful in cycles w/ GHz clock speeds » area = 1T + 1C/bit - lots of details later • Dealing w/ cache size latency  deepen cache hierarchy » small separate IL1$ and DL1$: minimize mem structural stalls » unified L2 reduces fragmentation problems » multicore introduces global L3 • may not continue to be a good idea • everything runs out of steam at some point – key is what % of L1 misses hit in L2 (induction on Ln & Ln+1)Page 3 5 CS6810 School of Computing University of Utah Locality • 2 basic types  temporal » if you used this block then you will use it again soon  spatial » if you used this block you will use a nearby one soon • Exploiting locality  match of application memory usage and cache design  some codes are touch once » encrypt/decrypt, encode/decode, … » here caches are a liability • why? 6 CS6810 School of Computing University of Utah Locality • 2 basic types  temporal » if you used this block then you will use it again soon  spatial » if you used this block you will use a nearby one soon • Exploiting locality  match of application memory usage and cache design » if you match you win – simple as that  some codes are touch once  fall through misses » e.g. encrypt/decrypt, encode/decode, … (media influence) » here caches are a liability • you have to swing to miss – try L1 – miss? – try L2 – miss? … • a lot of extra time if you end up going to DRAM anyway  historical tidbit » Seymour Cray didn’t believe in caches or DRAM • or even 2’s complement initiallyPage 4 7 CS6810 School of Computing University of Utah Performance Issues • Basics • Enter stalls  IPC gets degraded by stalls » we’ve seen stalls due to hazards and pipeline issues » now the focus is on memory stalls • influence is based on load & store % • with good branch prediction most stalls are memory induced € CPUtime = IC∗ CPI ∗ cycle_timeCPI = 1frequencyIPC =1CPICPUtime =ICIPC ∗ frequency8 CS6810 School of Computing University of Utah Computing Memory Effect • Misses induce stalls  ILP & TLP can overlap some of the penalty » but a miss is a surprise so some of the penalty won’t be hidden € XEQtime = (CPUcycles + MEMstallcycles) ∗ cycle_timeMEMstallcycles = num_misses∗miss_penaltyMEMstallcycles = IC ∗missesinstruction∗ miss_penaltyMEMstallcycles = IC ∗memory_accessesinstruction∗ miss_rate * miss_penaltySeparateMEMstallcycles = READstalls,Writestalls∑READstalls = IC ∗readsinstruction∗ read_miss_rate ∗ read_miss_penaltyWRITEstalls = IC ∗writesinstruction∗ write_miss_rate ∗ write_miss_penaltyPage 5 9 CS6810 School of Computing University of Utah 4 Questions = $ Organization  Q1: Where can a block be placed? (examples shortly) » fully associative: answer = anywhere » direct mapped: answer = only one place » set-associative: in a small number of “ways”  Q2: How is a block found? » 2 types of tags • status: valid and dirty (for now) • address: alias chance must be resolved – next slide  Q3: Which block is replaced on a miss? » LRU – best but expensive – matches temporal locality model » FIFO – good but expensive – LRU but on 1st touch not use » random – defeats temporal locality but simple to implement » approximation by epochs – add “use” status tag  Q4: What happens on a write miss? » hit: write-through or write-back (requires dirty flag) » miss: write-replace or write-around (modern CPU’s ~allow both) • a.k.a. write_allocate or write_no_allocate) 10 CS6810 School of Computing University of Utah Cache Components • Cache block or line  size varies – 64B is a common choice » no reason why the block size can’t be larger the deeper you go in the memory hierarchy • cache lines typically the same size – reduces some complexity • memory to cache transfer in line sized chunks • disk to memory transfer in page sized chunks • 2 main structures & a bunch of logic  data RAM – holds the cached data  tag RAM – holds tag information » same number of “entries” as data RAM • entry = line for direct mapped or fully associative • entry = set for set-associative » width for set associative a number of ways » for each set of address tags • status tags present as wellPage 6 11 CS6810 School of Computing University of Utah Block Identification • Address  tag = address tag » held in tag ram » size is what’s left over after index and block offset size  index » log2(number of data RAM entries)  block offset » says which byte, word, or half-word is to be moved to the target register • silly in a way – word or doubles transferred to register • appropriate byte or half-word is then used for the op » size = log2(line size)  increase offset or index reduces tag size 12 CS6810 School of Computing University of Utah Cache Access 8-byte words 101000 Direct-mapped cache: each address maps to a unique address 8 words: 3 index bits Byte address Data array Sets Offset Alias problem?Page 7 13 CS6810 School of Computing University of Utah De-Alias by Matching Address Tag 8-byte words 101000 Direct-mapped cache: each


View Full Document

U of U CS 6810 - Caches

Documents in this Course
Pipelines

Pipelines

14 pages

Load more
Download Caches
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Caches and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Caches 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?