Unformatted text preview:

CPE 631 Caches Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic milenka ece uah edu http www ece uah edu milenka Outline Memory Hierarchy Four Questions for Memory Hierarchy Cache Performance 1 Reduce the miss rate 2 Reduce the miss penalty or 3 Reduce the time to hit in the cache AM LaCASA 2 Processor DRAM Latency Gap Processor 2x 1 5 year 1000 Performance C PU 100 Processor Memory Performance Gap grows 50 year Memory 2x 10 years 10 DRAM LaCASA 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 Time 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1 1980 AM 3 Solution The Memory Hierarchy MH User sees as much memory as is available in cheapest technology and access it at the speed offered by the fastest technology Levels in Memory Hierarchy Lower Upper Processor Control Datapath AM Speed Capacity Cost bit LaCASA Fastest Smallest Highest Slowest Biggest Lowest 4 Generations of Microprocessors Time of a full cache miss in instructions executed 1st Alpha 340 ns 5 0 ns 68 clks x 2 or 136 2nd Alpha 266 ns 3 3 ns 80 clks x 4 or 320 3rd Alpha 180 ns 1 7 ns 108 clks x 6 or 648 1 2X latency x 3X clock rate x 3X Instr clock 5X AM LaCASA 5 Why hierarchy works Principle of locality Probability of reference Address space AM LaCASA Rule of thumb Programs spend 90 of their execution time in only 10 of code Temporal locality recently accessed items are likely to be accessed in the near future Keep them close to the processor Spatial locality items whose addresses are near one another tend to be referenced close together in time Move blocks consisted of contiguous words to the upper level 6 Cache Measures Upper Level Memory To Processor Bl X Lower Level Memory Bl Y Hit time Miss Penalty From Processor Hit data appears in some block in the upper level Bl X Miss data needs to be retrieved from the lower level Bl Y AM LaCASA Hit Rate the fraction of memory access found in the upper level Hit Time time to access the upper level RAM access time Time to determine hit miss Miss rate 1 Hit Rate Miss penalty time to replace a block in the upper level time to retrieve the block from the lower level Average memory access time Hit time Miss rate x Miss penalty ns or clocks 7 Levels of the Memory Hierarchy Capacity Access Time Cost CPU Registers 100s Bytes 1s ns Cache 10s 100s K Bytes 1 10 ns 10 MByte Main Memory M Bytes 100ns 300ns 1 MByte Disk 10s G Bytes 10 ms 10 000 000 ns AM 0 0031 MByte Tape infinite sec min 0 0014 MByte LaCASA Registers Staging Xfer Unit Upper Leve faster Instr Operands prog compiler Cache Blocks 1 8 bytes cache cntl 8 128 bytes Memory Pages OS 512 4K bytes Disk Files Tape user operator Mbytes Larger Lower Level 8 Four Questions for Memory Heir Q 1 Where can a block be placed in the upper level Block placement Q 2 How is a block found if it is in the upper level Block identification Q 3 Which block should be replaced on a miss Block replacement AM LaCASA direct mapped fully associative set associative Random LRU Least Recently Used Q 4 What happens on a write Write strategy Write through vs write back Write allocate vs No write allocate 9 Direct Mapped Cache In a direct mapped cache each memory address is associated with one possible block within the cache Therefore we only need to look in a single location in the cache for the data if it exists in the cache Block is the unit of transfer between cache and memory AM LaCASA 10 Q1 Where can a block be placed in the upper level Block 12 placed in 8 block cache Fully associative direct mapped 2 way set associative S A Mapping Block Number Modulo Number Sets Full Mapped 01234567 Direct Mapped 2 Way Assoc 12 mod 8 4 12 mod 4 0 01234567 00112233 Cache 1111111111222222222233 01234567890123456789012345678901 AM LaCASA Memory 11 Direct Mapped Cache cont d Memory Memory 0 Address 1 2 3 4 5 6 7 8 9 A B C AM D E F LaCASA Cache Index 0 1 2 3 Cache 4 byte 12 Direct Mapped Cache cont d Since multiple memory addresses map to same cache index how do we tell which one is in there What if we have a block size 1 byte Result divide memory address into three fields Block Address tttttttttttttttttt iiiiiiiiii oooo AM LaCASA TAG to check if have the correct block INDEX to select block OFFSET to select byte within the block 13 Direct Mapped Cache Terminology INDEX specifies the cache index which row of the cache we should look in OFFSET once we have found correct block specifies which byte within the block we want TAG the remaining bits after offset and index are determined these are used to distinguish between all the memory addresses that map to the same location BLOCK ADDRESS TAG INDEX AM LaCASA 14 Direct Mapped Cache Example AM LaCASA Conditions 32 bit architecture word 32bits address unit is byte 8KB direct mapped cache with 4 words blocks Determine the size of the Tag Index and Offset fields OFFSET specifies correct byte within block cache block contains 4 words 16 24 bytes 4 bits INDEX specifies correct row in the cache cache size is 8KB 213 bytes cache block is 24 bytes Rows in cache 1 block 1 row 213 24 29 9 bits TAG Memory address length offset index 32 4 9 19 tag is leftmost 19 bits 15 1 KB Direct Mapped Cache 32B blocks For a 2 N byte cache The uppermost 32 N bits are always the Cache Tag The lowest M bits are the Byte Select Block Size 2 M 31 Cache Tag Example 0x50 Stored as part of the cache state Valid Bit Cache Tag 9 Cache Index 4 0 Byte Select Ex 0x01 Ex 0x00 Cache Data Byte 31 0x50 Byte 63 AM LaCASA Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3 Byte 1023 Byte 992 31 16 Two way Set Associative Cache N way set associative N entries for each Cache Index N direct mapped caches operates in parallel N typically 2 to 4 Example Two way set associative cache Cache Index selects a set from the cache The two tags in the set are compared in parallel Data is selected based on the tag result Valid Cache Tag AMAdr Tag Compare LaCASA Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Sel1 1 Mux 0 Sel0 Cache Tag Valid Compare OR Hit Cache Block 17 Disadvantage of Set Associative Cache N way Set Associative Cache v Direct Mapped Cache N comparators vs 1 Extra MUX delay for the data Data comes AFTER Hit Miss In a direct mapped cache Cache Block is available BEFORE Hit Miss Possible to assume a hit and continue Recover later if miss Valid …


View Full Document

UAH CPE 631 - Memory Hierarchy

Loading Unlocking...
Login

Join to view Memory Hierarchy and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Memory Hierarchy and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?