Unformatted text preview:

CPE 631 Lecture 04 CPU Caches Electrical and Computer Engineering University of Alabama in Huntsville CPE 631 AM Outline Memory Hierarchy Four Questions for Memory Hierarchy Cache Performance 13 01 19 UAH CPE631 2 CPE 631 AM Processor DRAM Latency Gap Processor 2x 1 5 year CPU 100 Processor Memory Performance Gap grows 50 year 10 Memory 2x 10 years DRAM 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1 1980 Performance 1000 Time 13 01 19 UAH CPE631 3 CPE 631 AM Solution The Memory Hierarchy MH User sees as much memory as is available in cheapest technology and access it at the speed offered by the fastest technology Levels in Memory Hierarchy Lower Upper Processor Control Datapath Speed Capacity Cost bit Slowest Biggest Lowest Fastest Smallest Highest 13 01 19 UAH CPE631 4 CPE 631 AM Generations of Microprocessors Time of a full cache miss in instructions executed 1st Alpha 2nd Alpha 3rd Alpha 340 ns 5 0 ns 68 clks x 2 or 266 ns 3 3 ns 80 clks x 4 or 180 ns 1 7 ns 108 clks x 6 or 136 320 648 1 2X latency x 3X clock rate x 3X Instr clock 5X 13 01 19 UAH CPE631 5 CPE 631 AM Why hierarchy works Principle of locality Probability of reference Address space Rule of thumb Programs spend 90 of their execution time in only 10 of code Temporal locality recently accessed items are likely to be accessed in the near future Keep them close to the processor Spatial locality items whose addresses are near one another tend to be referenced close together in time Move blocks consisted of contiguous words to the upper level 13 01 19 UAH CPE631 6 CPE 631 AM Cache Measures Upper Level Memory To Processor Bl X Lower Level Memory Bl Y Hit time Miss Penalty From Processor Hit data appears in some block in the upper level Bl X Hit Rate the fraction of memory access found in the upper level Hit Time time to access the upper level RAM access time Time to determine hit miss Miss data needs to be retrieved from the lower level Bl Y Miss rate 1 Hit Rate Miss penalty time to replace a block in the upper level time to retrieve the block from the lower level Average memory access time Hit time Miss rate x Miss penalty ns or clocks 13 01 19 UAH CPE631 7 CPE 631 AM Levels of the Memory Hierarchy Capacity Access Time Cost CPU Registers 100s Bytes 1s ns Cache 10s 100s K Bytes 1 10 ns 10 MByte Main Memory M Bytes 100ns 300ns 1 MByte Disk 10s G Bytes 10 ms 10 000 000 ns 0 0031 MByte Tape infinite sec min 0 0014 13 01 19MByte Registers Staging Xfer Unit Upper Leve faster Instr Operands prog compiler Cache Blocks 1 8 bytes cache cntl 8 128 bytes Memory Pages OS 512 4K bytes Disk Files Tape UAH CPE631 user operator Mbytes Larger Lower Level 8 CPE 631 AM Four Questions for Memory Heir Q 1 Where can a block be placed in the upper level Block placement direct mapped fully associative set associative Q 2 How is a block found if it is in the upper level Block identification Q 3 Which block should be replaced on a miss Block replacement Random LRU Least Recently Used Q 4 What happens on a write Write strategy Write through vs write back Write allocate vs No write allocate 13 01 19 UAH CPE631 9 CPE 631 AM Direct Mapped Cache In a direct mapped cache each memory address is associated with one possible block within the cache Therefore we only need to look in a single location in the cache for the data if it exists in the cache Block is the unit of transfer between cache and memory 13 01 19 UAH CPE631 10 Q1 Where can a block be placed in the upper level CPE 631 AM Block 12 placed in 8 block cache Fully associative direct mapped 2 way set associative S A Mapping Block Number Modulo Number Sets Full Mapped 01234567 Direct Mapped 2 Way Assoc 12 mod 8 4 12 mod 4 0 01234567 00112233 Cache 1111111111222222222233 01234567890123456789012345678901 Memory 13 01 19 UAH CPE631 11 CPE 631 AM Direct Mapped Cache cont d Memory Address 0 1 2 3 4 5 6 7 8 9 A B C D E F 13 01 19 Memory Cache Index 0 1 2 3 UAH CPE631 Cache 4 byte 12 CPE 631 AM Direct Mapped Cache cont d Since multiple memory addresses map to same cache index how do we tell which one is in there What if we have a block size 1 byte Result divide memory address into three fields Block Address tttttttttttttttttt iiiiiiiiii oooo TAG to check if have the correct block 13 01 19 INDEX to select block UAH CPE631 OFFSET to select byte within the 13 block CPE 631 AM Direct Mapped Cache Terminology INDEX specifies the cache index which row of the cache we should look in OFFSET once we have found correct block specifies which byte within the block we want TAG the remaining bits after offset and index are determined these are used to distinguish between all the memory addresses that map to the same location BLOCK ADDRESS TAG INDEX 13 01 19 UAH CPE631 14 CPE 631 AM Direct Mapped Cache Example Conditions 32 bit architecture word 32bits address unit is byte 8KB direct mapped cache with 4 words blocks Determine the size of the Tag Index and Offset fields OFFSET specifies correct byte within block cache block contains 4 words 16 24 bytes 4 bits INDEX specifies correct row in the cache cache size is 8KB 213 bytes cache block is 24 bytes Rows in cache 1 block 1 row 213 24 29 9 bits TAG Memory address length offset index 32 4 9 19 tag is leftmost 19 bits 13 01 19 UAH CPE631 15 CPE 631 AM 1 KB Direct Mapped Cache 32B blocks For a 2 N byte cache The uppermost 32 N bits are always the Cache Tag The lowest M bits are the Byte Select Block Size 2 M 31 Example 0x50 Stored as part of the cache state Cache Tag 13 01 19 Ex 0x00 Cache Data Byte 31 0x50 Ex 0x01 Byte 63 Valid Bit 4 0 Byte Select Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3 Byte 1023 UAH CPE631 Cache Tag 9 Cache Index Byte 992 31 16 CPE 631 AM Two way Set Associative Cache N way set associative N entries for each Cache Index N direct mapped caches operates in parallel N typically 2 to 4 Example Two way set associative cache Cache Index selects a set from the cache The two tags in the set are compared in parallel Data is selected based on the tag result Valid Cache Tag Adr Tag Compare Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Sel1 1 Mux 0 Sel0 Cache …


View Full Document

UAH CPE 631 - CPU Caches

Loading Unlocking...
Login

Join to view CPU Caches and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CPU Caches and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?