15213 Caches Oct 22 1998 Topics Memory Hierarchy Locality of Reference Cache Design Direct Mapped Associative class18 ppt Computer System Processor Processor interrupt Cache Cache Memory I O Memory I Obus bus Memory Memory I O I O controller controller disk Disk class18 ppt disk Disk 2 I O I O controller controller I O I O controller controller Display Display Network Network CS 213 F 98 Levels in Memory Hierarchy cache CPU CPU regs regs Register size speed Mbyte block size 200 B 3 ns 4B 4B C a c h e 8B virtual memory Memory Memory Cache 32 KB 4MB 6 ns 100 MB 8B Memory 128 MB 100 ns 1 50 MB 4 KB 4 KB disk disk Disk Memory 20 GB 10 ms 0 06 MB larger slower cheaper class18 ppt 3 CS 213 F 98 Alpha 21164 Chip Photo Microprocessor Report 9 12 94 Caches L1 data L1 instruction L2 unified TLB Branch history class18 ppt 4 CS 213 F 98 Alpha 21164 Chip Caches Right Half L2 Caches L1 data L1 instruction L2 unified TLB Branch history L3 Control L1 Data L1 I n s t r Right Half L2 class18 ppt 5 L2 Tags CS 213 F 98 Locality of Reference Principle of Locality Programs tend to reuse data and instructions near those they have used recently Temporal locality recently referenced items are likely to be referenced in the near future Spatial locality items with nearby addresses tend to be referenced close together in time sum 0 for i 0 i n i sum a i v sum Locality in Example Data Reference array elements in succession spatial Instruction Reference instructions in sequence spatial Cycle through loop repeatedly temporal class18 ppt 6 CS 213 F 98 Caching The Basic Idea Main Memory Stores words A Z in example Cache Stores subset of the words 4 in example Organized in blocks Multiple words To exploit spatial locality Small Fast Cache Processor A B G H Big Slow Memory A B C Y Z Access Word must be in cache for processor to access class18 ppt 7 CS 213 F 98 Basic Idea Cont Initial Read C Read D Read Z A B A B A B Y Z G H C D C D G H Word already in cache Cache hit Load block Y Z into cache Evict oldest entry Cache holds 2 blocks Each with 2 words Load block C D into cache Cache miss Maintaining Cache Every time processor performs load or store bring block containing word into cache May need to evict existing block Subsequent loads or stores to any word in block performed within cache class18 ppt 8 CS 213 F 98 Accessing Data in Memory Hierarchy Between any two levels memory divided into blocks Data moves between levels on demand in block sized chunks Invisible to application programmer Hardware responsible for cache operation Upper level blocks a subset of lower level blocks Access word w in block a hit Access word v in block b miss w High Level v a a a b b Low Level a class18 ppt b a 9 b a CS 213 F 98 b Design Issues for Caches Key Questions Where should a block be placed in the cache block placement How is a block found in the cache block identification Which block should be replaced on a miss block replacement What happens on a write write strategy Constraints Design must be very simple Hardware realization All decision making within nanosecond time scale Want to optimize performance for typical programs Do extensive benchmarking and simulations Many subtle engineering trade offs class18 ppt 10 CS 213 F 98 Simplest Design Given memory block has unique cache location Parameters Block size B 2 b Number of bytes in each block Typically 2X 8X word size Number of Sets S 2s Number of blocks cache can hold Total Cache Size B S 2 b s Physical Address Address used to reference main memory n bits to reference N 2 n total bytes Partition into fields Offset Lower b bits indicate which byte within block Set Next s bits indicate how to locate block within cache Tag Identifies this block when in cache Direct Mapped Caches n bit Physical Address t tag class18 ppt 11 s set index CS 213 F 98 b offset Indexing into Direct Mapped Cache Use set index bits to select cache set Set 0 Tag Valid 0 1 B 1 Set 1 Tag Valid 0 1 B 1 0 1 B 1 Set S 1 t tag s Tag Valid b set index offset Physical Address class18 ppt 12 CS 213 F 98 Direct Mapped Cache Tag Matching Identifying Block Must have tag match high order bits of address Must have Valid 1 1 Selected Set t tag Tag s b set index Valid 0 1 Lower bits of address select byte or word within cache block offset Physical Address class18 ppt 13 B 1 CS 213 F 98 t 1 s 2 x xx 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 b 1 x Direct Mapped Cache Simulation N 16 byte addresses B 2 bytes block S 4 sets E 1 entry set Address trace reads 0 0000 1 0001 13 1101 8 1000 0 0000 v 1 0 0000 miss tag data 0 m 1 m 0 1 2 v 1 8 1000 miss tag data 1 4 14 1 0 m 1 m 0 1 1 m 13 m 12 v m 9 m 8 3 class18 ppt 13 1101 miss v tag data 0 0000 miss tag data 1 0 m 1 m 0 1 1 m 13 m 12 CS 213 F 98 Why Use Middle Bits as Index High Order Bit Indexing 4 block Cache 00 01 10 11 0000 0001 0010 0011 0100 High Order Bit Indexing 0101 Adjacent memory blocks would 0110 map to same cache block 0111 Poor use of spatial locality 1000 Middle Order Bit Indexing 1001 Consecutive memory blocks map 1010 to different cache blocks 1011 Can hold N byte region of address 1100 space in cache at one time 1101 1110 1111 class18 ppt 15 Middle Order Bit Indexing 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 CS 213 F 98 Direct Mapped Cache Implementation DECStation 3100 31 30 29 19 18 17 16 15 14 13 5 4 3 2 1 0 byte tag set offset valid tag 16 bits data 32 bits 16 384 sets data hit class18 ppt 16 CS 213 F 98 Properties of Direct Mapped Strength Caches Minimal control hardware overhead Simple design Relatively easy to make fast Weakness Vulnerable to thrashing Two heavily used blocks have same cache index Repeatedly evict one to make room for other Cache Block class18 ppt 17 CS 213 F 98 Vector Product Example float dot prod float x 1024 y 1024 float sum 0 0 int i for i 0 i 1024 i sum x i y i return sum Machine DECStation 5000 MIPS Processor with 64KB direct mapped cache …
View Full Document