An Example Memory Hierarchy 15 213 Smaller faster and costlier per byte storage devices The course that gives CMU its Zip Cache Memories September 28 2007 Topics Locality of reference in the memory hierarchy cache memory organization Direct mapped caches Set associative caches Generic L5 Locality L2 cache holds cache lines retrieved from main memory main memory DRAM Main memory holds disk blocks retrieved from local disks local secondary storage local disks L4 Local disks hold files retrieved from disks on remote network servers remote secondary storage tapes distributed file systems Web servers 15 213 F 07 Question Does this function have good locality int sum array rows int a M N int i j sum 0 sum 0 for i 0 i n i Data sum a i Reference array elements in succession return sum stride 1 reference pattern Spatial locality Reference sum each iteration Temporal locality for i 0 i M i for j 0 j N j sum a i j return sum Instructions Reference instructions in sequence Spatial locality Cycle through loop repeatedly Temporal locality 3 L1 cache holds cache lines retrieved from the L2 cache memory Claim Being able to look at code and get a qualitative sense of its locality is a key skill for a professional programmer Programs tend to reuse data and instructions near those they have used recently or that were recently referenced themselves Temporal locality Recently referenced items are likely to be referenced in the near future Spatial locality Items with nearby addresses tend to be referenced close together in time Locality Example off chip L2 cache SRAM Locality Example Principle of Locality L2 2 class10 ppt CPU registers hold words retrieved from L1 cache L1 on chip L1 cache SRAM L3 Larger slower and cheaper per byte storage devices Caching L0 registers 15 213 F 07 4 Page 1 15 213 F 07 Locality Example Locality Example Question Does this function have good locality Question Can you permute the loops so that the function scans the 33 d array a with a stridestride 1 reference pattern and thus has good spatial locality int sum array cols int a M N int i j sum 0 int sum array 3d int a M N N int i j k sum 0 for j 0 j N j for i 0 i M i sum a i j return sum for i 0 i M i for j 0 j N j for k 0 k N k sum a k i j return sum 15 213 F 07 5 15 213 F 07 6 Caching in a Memory Hierarchy Caches Cache A smaller faster storage device that acts as a staging area for a subset of the data in a larger slower device Level k 8 4 Fundamental idea of a memory hierarchy 10 4 For each k the faster smaller device at level k serves as a cache for the larger slower device at level k 1 Why do memory hierarchies work 7 9 Programs tend to access the data at level k more often than they access the data at level k 1 Thus the storage at level k 1 can be slower and thus larger and cheaper per bit Net effect A large pool of memory that costs as much as the cheap storage near the bottom but that serves data to programs at the rate of the fast storage near the top Level k 1 15 213 F 07 8 Page 2 14 10 3 Smaller faster more expensive device at level k caches a subset of the blocks from level k 1 Data is copied between levels in block sized transfer units 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Larger slower cheaper storage device at level k 1 is partitioned into blocks 15 213 F 07 General Caching Concepts 14 12 Level k Level k 1 Program needs object d which is stored in some block b Request 12 14 0 1 2 3 Cache hit 4 12 9 14 3 12 4 Types of cache misses 0 1 4 4 5 6 7 8 9 10 11 12 13 14 15 Capacity miss 3 larger than the cache b is not at level k so level k cache must fetch it from level k 1 E g block 12 Conflict miss z Most caches limit blocks at level k 1 to a small subset sometimes a singleton of the block positions at level k If level k cache is full then some current block must be replaced evicted Which one is the victim z E g Block i at level k 1 must be placed in block i mod 4 at level k 1 z Conflict misses occur when the level k cache is large enough z Placement policy where can the new but multiple data objects all map to the same level k block block go E g b mod 4 z Replacement policy which block should be evicted E g LRU z E g Referencing blocks 0 8 0 8 0 8 would miss every time 15 213 F 07 Cache Type What is Cached Registers 4 byte words CPU core 0 Compiler TLB Address translations 64 bytes block 64 bytes block 4 KB page On Chip TLB 0 Hardware On Chip L1 Off Chip L2 Main memory Parts of files Main memory 1 Hardware 10 Hardware 100 Hardware OS 100 OS Parts of files Local disk Web pages Local disk Web pages Remote server disks 11 z Occurs when the set of active cache blocks working set is Where is it Cached Latency cycles 15 213 F 07 10 Examples of Caching in the Hierarchy Network buffer cache Browser cache Web cache Cold compulsory miss z Cold misses occur because the cache is empty Program finds b in the cache at level k E g block 14 9 L1 cache L2 cache Virtual Memory Buffer cache Cache miss Request 12 2 General Caching Concepts Cache Memories Cache memories are small fast SRAMSRAM based memories managed automatically in hardware Managed By Hold frequently accessed blocks of main memory CPU looks first for data in L1 then in L2 then in main memory Typical system structure CPU chip 10 000 000 AFS NFS client 10 000 000 Web browser 1 000 000 000 Web proxy server SRAM Port L2 data 15 213 F 07 12 Page 3 register file L1 L2 ALU tags cache bus interface memory bus system bus I O bridge main memory 15 213 F 07 Inserting an L1 Cache Between the CPU and Main Memory line 0 line 1 t tag bits per line Cache is an array of sets The tiny very fast CPU register file has room for four 4 byte words The transfer unit between the CPU register file and the cache is a 4 byte block The transfer unit between the cache and main memory is a 4 word block 16 bytes General Organization of a Cache Each set contains one or more lines The small fast L1 cache has room for two 4 word blocks …
View Full Document