Unformatted text preview:

Cache Memories 1 Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging loops to improve spatial locality Using blocking to improve temporal locality 2 Cache Memories Cache memories are small fast SRAM based memories managed automatically in hardware Hold frequently accessed blocks of main memory CPU looks first for data in caches e g L1 L2 and L3 then in main memory Typical system structure CPU chip Register file Cache memories Bus interface ALU System bus I O bridge Memory bus Main memory 3 General Cache Organization S E B E 2e lines per set set line S 2s sets v tag 0 1 2 B 1 Cache size C S x E x B data bytes valid bit B 2b bytes per cache block the data 4 Cache Read E 2e lines per set Locate set Check if any line in set has matching tag Yes line valid hit Locate data starting at offset Address of word t bits S 2s sets tag s bits b bits set block index offset data begins at this offset v tag 0 1 2 B 1 valid bit B 2b bytes per cache block the data 5 Example Direct Mapped Cache E 1 Direct mapped One line per set Assume cache block size 8 bytes v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 Address of int t bits 0 01 100 find set S 2s sets 6 Example Direct Mapped Cache E 1 Direct mapped One line per set Assume cache block size 8 bytes valid match assume yes hit v tag Address of int t bits 0 01 100 0 1 2 3 4 5 6 7 block offset 7 Example Direct Mapped Cache E 1 Direct mapped One line per set Assume cache block size 8 bytes Address of int valid match assume yes hit v tag t bits 0 01 100 0 1 2 3 4 5 6 7 block offset int 4 Bytes is here No match old line is evicted and replaced 8 Direct Mapped Cache Simulation t 1 x s 2 xx b 1 x M 16 byte addresses B 2 bytes block S 4 sets E 1 Blocks set Address trace reads one byte per read 0 00002 miss 1 00012 hit miss 7 01112 miss 8 10002 miss 0 00002 Set 0 Set 1 Set 2 Set 3 v 0 1 Tag 1 0 Block M 8 9 M 0 1 1 0 M 6 7 9 Ignore the variables sum i j A Higher Level Example assume cold empty cache a 0 0 goes here int sum array rows double a 16 16 int i j double sum 0 for i 0 i 16 i for j 0 j 16 j sum a i j return sum int sum array cols double a 16 16 int i j double sum 0 for j 0 i 16 i for i 0 j 16 j sum a i j return sum 32 B 4 doubles blackboard 10 E way Set Associative Cache Here E 2 E 2 Two lines per set Assume cache block size 8 bytes Address of short int t bits v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 0 01 100 find set 11 E way Set Associative Cache Here E 2 E 2 Two lines per set Assume cache block size 8 bytes Address of short int t bits compare both 0 01 100 valid match yes hit v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 block offset 12 E way Set Associative Cache Here E 2 E 2 Two lines per set Assume cache block size 8 bytes Address of short int t bits compare both 0 01 100 valid match yes hit v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 block offset short int 2 Bytes is here No match One line in set is selected for eviction and replacement Replacement policies random least recently used LRU 13 2 Way Set Associative Cache Simulation t 2 xx s 1 x b 1 x M 16 byte addresses B 2 bytes block S 2 sets E 2 blocks set Address trace reads one byte per read 0 00002 miss hit 1 00012 miss 7 01112 miss 8 10002 hit 0 00002 v 0 Set 0 1 1 0 Set 1 1 0 0 Tag 00 10 Block M 0 1 M 8 9 01 M 6 7 14 A Higher Level Example int sum array rows double a 16 16 int i j double sum 0 assume cold empty cache a 0 0 goes here for i 0 i 16 i for j 0 j 16 j sum a i j return sum int sum array rows double a 16 16 int i j double sum 0 Ignore the variables sum i j for j 0 i 16 i for i 0 j 16 j sum a i j return sum 32 B 4 doubles blackboard 15 Spectrum of Associativity For a cache with 8 entries Chapter 5 Large and Fast Exploiting Memory Hierarchy 16 16 What about writes Multiple copies of data exist L1 L2 Main Memory Disk What to do on a write hit Write through write immediately to memory Write back defer write to memory until replacement of line Need a dirty bit line different from memory or not What to do on a write miss Write allocate load into cache update line in cache Good if more writes to the location follow No write allocate writes immediately to memory Typical Write through No write allocate Write back Write allocate 17 Intel Core i7 Cache Hierarchy Processor package Core 0 Core 3 Regs L1 d cache L1 i cache and d cache 32 KB 8 way Access 4 cycles Regs L1 i cache L2 unified cache L1 d cache L1 i cache L2 unified cache L3 unified cache shared by all cores L2 unified cache 256 KB 8 way Access 11 cycles L3 unified cache 8 MB 16 way Access 30 40 cycles Block size 64 bytes for all caches Main memory 18 Cache Performance Metrics Miss Rate Fraction of memory references not found in cache misses accesses 1 hit rate Typical numbers in percentages 3 10 for L1 can be quite small e g 1 for L2 depending on size etc Hit Time Time to deliver a line in the cache to the …


View Full Document

UT CS 429H - Cache Memories

Loading Unlocking...
Login

Join to view Cache Memories and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Cache Memories and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?