CS152 Computer Architecture and Engineering Assigned February 16 Caches and the Memory Hierarchy Problem Set 2 February 16 2008 Due February 26 http inst eecs berkeley edu cs152 sp08 The problem sets are intended to help you learn the material and we encourage you to collaborate with other students and to ask questions in discussion sections and office hours to understand the problems However each student must turn in their own solutions to the problems The problem sets also provide essential background material for the quizzes The problem sets will be graded primarily on an effort basis but if you do not work through the problem sets you are unlikely to succeed at the quizzes We will distribute solutions to the problem sets on the day the problem sets are due to give you feedback Homework assignments are due at the beginning of class on the due date Homework will not be accepted once solutions are handed out This material will be on Quiz 2 not Quiz 1 Problem 2 1 Cache Access Time Performance This problem requires the knowledge of Handout 2 Cache Implementations and Lecture 7 Please read these materials before answering the following questions Ben is trying to determine the best cache configuration for a new processor He knows how to build two kinds of caches direct mapped caches and 4 way set associative caches The goal is to find the better cache configuration with the given building blocks He wants to know how these two different configurations affect the clock speed and the cache miss rate and choose the one that provides better performance in terms of average latency for a load Problem 2 1 A Access Time Direct Mapped Now we want to compute the access time of a direct mapped cache We use the implementation shown in Figure H2 A in Handout 2 Assume a 128 KB cache with 8word 32 byte cache lines The address is 32 bits and the two least significant bits of the address are ignored since a cache access is word aligned The data output is also 32 bits and the MUX selects one word out of the eight words in a cache line Using the delay equations given in Table 2 1 1 fill in the column for the direct mapped DM cache in the table In the equation for the data output driver associativity refers to the associativity of the cache 1 for direct mapped caches A for A way set associative caches Component Decoder Memory array Delay equation ps 200 of index bits 1000 200 log2 of rows 200 log2 of bits in a row 1000 Comparator 200 of tag bits 1000 N to 1 MUX 500 log2 N 1000 Buffer driver 2000 Data output driver 500 associativity 1000 Valid output driver 1000 DM ps SA ps Tag Data Tag Data Table 2 1 1 Delay of each Cache Component What is the critical path of this direct mapped cache for a cache read What is the access time of the cache the delay of the critical path To compute the access time assume that a 2 input gate AND OR delay is 500 ps If the CPU clock is 150 MHz how many CPU cycles does a cache access take Problem 2 1 B Access Time Set Associative We also want to investigate the access time of a set associative cache using the 4 way setassociative cache in Figure H2 B in Handout 2 Assume the total cache size is still 128KB each way is 32 KB a 4 input gate delay is 1000 ps and all other parameters such as the input address cache line etc are the same as part 2 1 A Compute the delay of each component and fill in the column for a 4 way set associative cache in Table M2 1 1 What is the critical path of the 4 way set associative cache What is the access time of the cache the delay of the critical path What is the main reason that the 4 way set associative cache is slower than the direct mapped cache If the CPU clock is 150 MHz how many CPU cycles does a cache access take Input Address Tag Index T S T S T S T S 4 2b 2 data words Tag Decoder Data Decoder MUX MUX MUX MUX Valid Bit Comparator Buffer Driver Valid Output Driver Problem 2 1 C Miss rate analysis Now Ben is studying the effect of set associativity on the cache performance Since he now knows the access time of each configuration he wants to know the miss rate of each one For the miss rate analysis Ben is considering two small caches a direct mapped cache with 8 lines with 16 bytes line and a 4 way set associative cache of the same size For the set associative cache Ben tries out two replacement policies least recently used LRU and round robin FIFO Ben tests the cache by accessing the following sequence of hexadecimal byte addresses starting with empty caches For simplicity assume that the addresses are only 12 bits Complete the following tables for the direct mapped cache and both types of 4 way setassociative caches showing the progression of cache contents as accesses occur in the tables inv invalid and the column of a particular cache line contains the tag index contents of that line You only need to fill in elements in the table when a value changes D map Address 110 136 202 1A3 102 361 204 114 1A4 177 301 206 135 L0 L1 inv 11 line in cache L2 L3 L4 L5 L6 L7 inv inv inv inv inv inv 13 20 D map Total Misses Total Accesses hit no no no 4 way LRU hit line in cache Address 110 136 202 1A3 102 361 204 114 1A4 177 301 206 135 Set 0 Set 1 way0 way1 Way2 way3 way0 way1 way2 way3 inv Inv Inv inv 11 11 inv 13 inv inv 20 no no no 4 way LRU Total Misses Total Accesses 4 way FIFO hit line in cache Address 110 136 202 1A3 102 361 204 114 1A4 177 301 206 135 Set 0 Set 1 way0 way1 way2 way3 way0 way1 way2 way3 inv Inv Inv inv 11 inv 13 inv inv 20 4 way FIFO Total Misses no no no Total Accesses Problem 2 1 D Average Latency Assume that the results of the above analysis can represent the average miss rates of the direct mapped and the 4 way LRU 128 KB caches studied in 2 1 A and 2 1 B What would be the average memory access latency in CPU cycles for each cache assume that a cache miss takes 20 cycles Which one is better For the different replacement policies for the set associative cache which one has a smaller cache miss rate for the address stream in 2 1 C Explain why Is that replacement policy always going to yield better miss rates If not give a counter example using an address stream Problem 2 2 Pipelined Cache Access This problem requires the knowledge of Lecture …
View Full Document
Unlocking...