Computer System The Memory Hierarchy CS 740 Processor Processor Sept 27 2002 interrupts Cache Cache Topics Memory I O Memory I Obus bus The memory hierarchy Cache design Memory Memory I O I O controller controller disk Disk 2 Page 1 disk Disk I O I O controller controller I O I O controller controller Display Display Network Network CS 740 F 02 The Tradeoff cache CPU CPU regs regs size speed Mbyte block size C a c h e register reference L1 cache reference 608 B 1 4 ns 128k B 4 2 ns 4B 4B 16 B C a c h e Why is bigger slower Physics slows us down Racing the speed of light 3 0x10 8m s virtual memory 8B L2 cache reference 512kB 4MB 16 8 ns 90 MB 16 B Memory Memory memory reference 128 MB 112 ns 2 6 MB 4 8 KB 4 KB disk disk disk memory reference clock 500MHz how far can I go in a clock cycle 3 0x10 8 m s 500x10 6 cycles s 0 6m cycle For comparison 21264 is about 17mm across Capacitance 27GB 9 ms 0 01 MB long wires have more capacitance either more powerful bigger transistors required or slower signal propagation speed proportional to capacitance going off chip has an order of magnitude more capacitance larger slower cheaper Numbers are for a 21264 at 700MHz 3 CS 740 F 02 4 Page 2 CS 740 F 02 Alpha 21164 Chip Photo Alpha 21164 Chip Caches L3 Control Right Half L2 Caches Microprocessor Report 9 12 94 L1 data L1 instruction L2 unified L3 off chip Caches L1 data L1 instruction L2 unified L3 off chip L1 Data L1 I n s t r Right Half L2 5 CS 740 F 02 6 Page 3 L2 Tags CS 740 F 02 Locality of Reference Caching The Basic Idea Principle of Locality Main Memory Programs tend to reuse data and instructions near those they have used recently Temporal locality recently referenced items are likely to be referenced in the near future Spatial locality items with nearby addresses tend to be referenced close together in time Locality in Example Stores words A Z in example Cache Stores subset of the words 4 in example Organized in lines Multiple words To exploit spatial locality sum 0 for i 0 i n i sum a i v sum Data Reference array elements in succession spatial Instructions Reference instructions in sequence spatial Cycle through loop repeatedly temporal 7 Small Fast Cache Processor A B G H Big Slow Memory A B C Y Z Access Word must be in cache for processor to access CS 740 F 02 8 Page 4 CS 740 F 02 Accessing Data in Memory Hierarchy How important are caches Between any two levels memory is divided into lines aka blocks Data moves between levels on demand in line sized chunks Invisible to application programmer Hardware responsible for cache operation Upper level lines a subset of lower level lines Access word w in line a hit Access word v in line b miss 21264 Floorplan Register files in middle of execution units 64k instr cache w 64k data cache High Level Caches take up a large fraction of the die 9 v a a a b b Low Level Figure from Jim Keller Compaq Corp CS 740 F 02 b a 10 Page 5 b a b a CS 740 F 02 Direct Mapped Caches Design Issues for Caches Simplest Design Key Questions Each memory line has a unique cache location Where should a line be placed in the cache line placement How is a line found in the cache line identification Which line should be replaced on a miss line replacement What happens on a write write strategy Parameters Line aka block size B 2b Number of bytes in each line Typically 2X 8X word size Number of Sets S 2s Number of lines cache can hold Total Cache Size B S 2b s Constraints Design must be very simple Hardware realization All decision making within nanosecond time scale Want to optimize performance for typical programs Do extensive benchmarking and simulations Many subtle engineering tradeoffs 11 Physical Address n bit Physical Address t s Address used to reference main memory n bits to reference N 2n total bytes tag set index Partition into fields Offset Lower b bits indicate which byte within line Set Next s bits indicate how to locate line within cache Tag Identifies this line when in cache CS 740 F 02 12 Page 6 CS 740 F 02 b offset Indexing into Direct Mapped Cache Use set index bits to select cache set Set 0 Tag Valid 0 1 B 1 Set 1 Tag Valid 0 1 B 1 Direct Mapped Cache Tag Matching Identifying Line Must have tag match high order bits of address Must have Valid 1 Set S 1 t tag s set index Tag Valid Selected Set 0 1 Tag Valid 0 1 B 1 B 1 b t offset tag s set index b Lower bits of address select byte or word within cache line offset Physical Address Physical Address 13 1 CS 740 F 02 14 Page 7 CS 740 F 02 Properties of Direct Mapped Caches Vector Product Example Strength float dot prod float x 1024 y 1024 float sum 0 0 int i for i 0 i 1024 i sum x i y i return sum Minimal control hardware overhead Simple design Relatively easy to make fast Weakness Vulnerable to thrashing Two heavily used lines have same cache index Repeatedly evict one to make room for other Machine DECStation 5000 MIPS Processor with 64KB direct mapped cache 16 B line size Cache Line Performance Good case 24 cycles element Bad case 66 cycles element 15 CS 740 F 02 16 Page 8 CS 740 F 02 Thrashing Example x 0 x 1 x 2 x 3 Cache Line x 1020 x 1021 x 1022 x 1023 Cache Line Cache Line Thrashing Example Good Case y 0 y 1 y 2 y 3 x 0 x 1 x 2 x 3 Cache Line y 1020 y 1021 y 1022 y 1023 Cache Line Access Sequence Read x 0 x 0 x 1 x 2 x 3 loaded Read y 0 y 0 y 1 y 2 y 3 loaded Read x 1 Hit Read y 1 Hit 2 misses 8 reads Cache Line Access one element from each array per iteration 17 y 0 y 1 y 2 y 3 CS 740 F 02 18 Page 9 Cache Line Analysis x i and y i map to different cache lines Miss rate 25 Two memory accesses iteration On every 4th iteration have two misses Timing 10 cycle loop time 28 cycles cache miss Average time iteration 10 0 25 2 28 CS 740 F 02 Thrashing Example Bad Case x 0 x 1 x 2 x 3 y 0 y 1 y 2 y 3 Access Pattern Read x 0 x 0 x 1 x 2 x 3 loaded Read y 0 y 0 y 1 y 2 y 3 loaded Read …
View Full Document
Unlocking...