Computer System 15 213 Caches March 16 2000 Processor Processor interrupt Cache Cache Topics Memory I O Memory I Obus bus Memory Hierarchy Locality of Reference Cache Design Direct Mapped Associative Memory Memory I O I O controller controller disk Disk class18 ppt class18 ppt CPU CPU regs regs Register size speed Mbyte line size 200 B 2 ns 8B 8B C a c h e 2 I O I O controller controller Display Display Network Network CS 213 S 00 Alpha 21164 Chip Photo Levels in Memory Hierarchy cache disk Disk I O I O controller controller Microprocessor Report 9 12 94 virtual memory Caches 32 B Cache Memory Memory 8 KB Memory 32KB 4MB 4 ns 100 MB 32 B 128 MB 60 ns 1 50 MB 8 KB L1 data L1 instruction L2 unified TLB Branch history disk disk Disk Memory 20 GB 8 ms 0 05 MB larger slower cheaper class18 ppt 3 class18 ppt CS 213 S 00 Page 1 4 CS 213 S 00 Alpha 21164 Chip Caches Locality of Reference Principle of Locality L3 Control Right Half L2 Caches L1 data L1 instruction L2 unified TLB Branch history Programs tend to reuse data and instructions near those they have used recently Temporal locality recently referenced items are likely to be referenced in the near future Spatial locality items with nearby addresses tend to be referenced close together in time L1 Data L1 I n s t r Locality in Example Right Half L2 class18 ppt Data Reference array elements in succession spatial Instructions Reference instructions in sequence spatial Cycle through loop repeatedly temporal L2 Tags 5 class18 ppt CS 213 S 00 Caching The Basic Idea Cache Stores subset of the words 4 in example Organized in lines Multiple words To exploit spatial locality Small Fast Cache Processor A B G H Big Slow Memory A B C CS 213 S 00 Initia l A B Read C Read D Read Z A B A B Y Z G H C D C D C D Cache holds 2 lines Each with 2 words Y Z Load line C D into cache Cache miss Word already in cache Cache hit Load line Y Z into cache Evict oldest entry Maintaining Cache Access Each time the processor performs a load or store bring line containing the word into the cache May need to evict existing line Subsequent loads or stores to any word in line performed within cache Word must be in cache for processor to access class18 ppt 6 Basic Idea Cont Main Memory Stores words A Z in example sum 0 for i 0 i n i sum a i v sum 7 class18 ppt CS 213 S 00 Page 2 8 CS 213 S 00 Accessing Data in Memory Hierarchy Design Issues for Caches Key Questions Between any two levels memory is divided into lines aka blocks Data moves between levels on demand in line sized chunks Invisible to application programmer Hardware responsible for cache operation Upper level lines a subset of lower level lines Access word w in line a hit Design must be very simple Hardware realization All decision making within nanosecond time scale Want to optimize performance for typical programs Do extensive benchmarking and simulations Many subtle engineering tradeoffs v a a a b b Low Level b a class18 ppt b a 9 Where should a line be placed in the cache line placement How is a line found in the cache line identification Which line should be replaced on a miss line replacement What happens on a write write strategy Constraints Access word v in line b miss w High Level b a class18 ppt CS 213 S 00 Direct Mapped Caches 10 CS 213 S 00 Indexing into Direct Mapped Cache Simplest Design Each memory line has a unique cache location Parameters Line aka block size B 2b Number of bytes in each line Typically 2X 8X word size Number of Sets S 2s Number of lines cache can hold Total Cache Size B S 2b s Use set index bits to select cache set Set 0 Tag Valid 0 1 B 1 Set 1 Tag Valid 0 1 B 1 0 1 B 1 n bit Physical Address Physical Address t s Address used to reference main memory n bits to reference N 2n total bytes tag set index Partition into fields Offset Lower b bits indicate which byte within line Set Next s bits indicate how to locate line within cache Tag Identifies this line when in cache b Set S 1 Tag Valid offset t tag s set index b offset Physical Address class18 ppt 11 class18 ppt CS 213 S 00 Page 3 12 CS 213 S 00 Direct Mapped Cache Tag Matching Direct Mapped Cache Simulation Identifying Line Must have tag match high order bits of address Must have Valid 1 Selected Set t tag Tag s set index t 1 s 2 x xx 1 Valid 0 1 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 B 1 Lower bits of address select byte or word within cache line b offset Physical Address class18 ppt 13 High Order Bit Indexing 4 line Cache High Order Bit Indexing Adjacent memory lines would map to same cache entry Poor use of spatial locality Middle Order Bit Indexing Consecutive memory lines map to different cache lines Can hold N byte region of address space in cache at one time class18 ppt 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 15 N 16 byte addresses B 2 bytes line S 4 sets E 1 entry set Address trace reads 0 0000 1 0001 13 1101 8 1000 0 0000 v 0 0000 miss tag data 1 0 v m 1 m 0 1 2 v 1 0 m 1 m 0 1 1 m 13 m 12 v m 9 m 8 3 4 13 1101 miss tag data 1 8 1000 miss tag data 1 class18 ppt CS 213 S 00 0 0000 miss tag data 1 0 m 1 m 0 1 1 m 13 m 12 14 CS 213 S 00 Direct Mapped Cache Implementation DECStation 3100 Why Use Middle Bits as Index 00 01 10 11 b 1 x Middle Order Bit Indexing 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 31 30 29 19 18 17 16 15 14 13 5 4 3 2 1 0 tag byte offset set valid tag 16 bits data 32 bits 16 384 sets data hit class18 ppt CS 213 S 00 Page 4 16 CS 213 S 00 Properties of Direct Mapped Caches Vector Product Example Strength Minimal control hardware overhead Simple design Relatively easy to make fast float dot prod float x 1024 y 1024 float sum 0 0 int i for i 0 i 1024 i sum x i y i return sum Weakness Vulnerable to thrashing Two heavily used lines have same cache index Repeatedly evict one to make room for other Machine Cache …
View Full Document