Administrivia Computer Systems Architecture CMSC 411 Unit 5 Memory Hierarchy HW 3 due today questions Quiz 2 Tuesday Oct 12 on Unit 3 basic pipelining practice quiz posted answers posted later today questions Alan Sussman October 7 2004 Read Chapter 5 except 5 11 5 15 CMSC 411 Alan Sussman 2 Last time Long instructions can cause structural hazards and WAW hazards why detect hazards early to allow precise exceptions in ID pipeline stage and delay EX cycle if problem detected can delay WB use history or future file let OS deal with it to enable precise exceptions Cache Memory MIPS R4000 pipeline design 8 stage pipeline superpipelining extra stages come from multi cycle cache accesses 2 cycle load delay and 3 cycle branch delay 1 delay slot 2 cycle stall for taken branches complex FP pipeline 8 stages used in different combinations for different operations CMSC 411 Alan Sussman 3 Issues to consider First there is main memory How big should the fastest memory cache memory be How do we decide what to put in cache memory If the cache is full how do we decide what to remove How do we find something in cache How do we handle writes CMSC 411 Alan Sussman Jargon frame address which page block number which cache block contents the data 5 CMSC 411 A Sussman from D O Leary CMSC 411 Alan Sussman 6 1 Then add a cache How does cache memory work Jargon Each address of a memory location is partitioned into The following slides discuss what cache memory is three organizations for cache memory block address tag index direct mapped set associative fully associative block offset how the bookkeeping is done Important note All addresses shown are in octal Addresses in the book are usually decimal Fig 5 5 CMSC 411 Alan Sussman 7 CMSC 411 Alan Sussman What is cache memory Main memory first Main memory Blocks are grouped into frames pages 3 frames in this picture Main memory is divided into cache blocks Each block contains many words 16 64 common now CMSC 411 Alan Sussman 9 CMSC 411 Alan Sussman Blocks are addressed by their frame number and their block number within the frame 0 1 0 2 0 3 0 0 4 5 0 6 0 7 1 0 1 1 1 2 1 3 1 4 1 1 5 6 10 Cache memory Main memory cont 0 0 8 1 7 2 0 2 1 2 2 Cache has many MANY fewer blocks than main memory each with a block number 0 1 2 3 4 5 6 7 a memory address 10 21 42 53 74 25 16 77 data CMSC 411 Alan Sussman 11 CMSC 411 A Sussman from D O Leary a valid bit 0 0 0 0 0 0 0 0 a dirty bit 0 0 0 0 0 0 0 0 CMSC 411 Alan Sussman 12 2 Cache memory cont Initially all the valid bits set to zero 0 1 2 3 4 Cache memory cont 5 6 7 10 21 42 53 74 25 16 77 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 2 3 Three ways to organize cache direct mapped set associative fully associative 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Direct mapped cache 0 1 10 21 42 53 74 25 16 77 0 CMSC 411 Alan Sussman In direct mapped cache block 14 can only be put in the cache block with address 4 Suppose want to load block 14 octal from memory into cache 0 1 2 3 4 5 6 7 CMSC 411 Alan Sussman 14 Direct mapped cache cont 4 5 6 7 After the load the contents look like this 0 1 2 3 4 5 6 7 10 21 42 53 74 25 16 77 10 21 42 53 14 25 16 77 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 So the cache will no longer hold the block with memory address 74 CMSC 411 Alan Sussman 15 Set 0 For example if divide cache into 4 sets block 14 can be put in any block in Set 0 since last two bits of 14 octal are zero 0 1 Set 1 2 3 Set 2 4 5 Set 3 6 7 10 24 41 55 72 26 13 77 0 0 0 0 CMSC 411 Alan Sussman 0 0 0 0 0 0 16 Set associative cache cont Set associative cache In set associative cache each memory block can be put in any of a set of possible blocks in cache CMSC 411 Alan Sussman 0 0 0 0 0 0 17 CMSC 411 A Sussman from D O Leary So after loading the block cache memory might look like this Set 0 Set 1 Set 2 Set 3 0 1 2 4 5 6 3 7 14 24 41 55 72 26 13 77 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CMSC 411 Alan Sussman 18 3 Set associative cache cont Note that the last two bits of the memory block s address always match the set number so do not need to be stored This part of the address is called the index The higher order bits are stored and are called the tag In these pictures both index and tag shown Set associative cache replacement Set 0 Set 1 Set 2 Set 3 0 2 3 4 6 1 5 Which entry in the set to replace Three common choices 7 14 24 41 55 72 26 13 77 Replace an eligible random block Replace the least recently used LRU block can be hard to keep track of so often only approximated 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CMSC 411 Alan Sussman Replace the oldest eligible block First In First Out or FIFO 19 CMSC 411 Alan Sussman 20 Data cache replacement Fig 5 6 Computer Systems Architecture CMSC 411 Unit 5 Memory Hierarchy SPEC2000 in misses per 1000 instructions Set associativity Two way Four way Size LRU Random FIFO 16KB 114 1 117 3 64KB 103 4 256KB 92 2 Eight Way Random FIFO LRU 115 5 111 7 115 1 113 3 109 0 111 8 110 4 104 3 103 9 102 4 102 3 103 1 99 7 100 5 100 3 92 1 92 5 92 1 92 5 92 1 92 1 92 5 LRU 92 1 CMSC 411 Alan Sussman Random FIFO Alan Sussman October 12 2004 21 Last time Administrivia Main memory frame address page number block number cache block within page contents the data Quiz 2 today questions HW for Unit 5 out soon Cache memory block address tag high order bits for matching index which set for set associative caches block offset which byte within the block contains way fewer blocks than …
View Full Document