CS152 Computer Architecture and Engineering February 16 2011 Assigned February 16 Caches and the Memory Hierarchy Problem Set 2 Due March 2 http inst eecs berkeley edu cs152 sp11 The problem sets are intended to help you learn the material and we encourage you to collaborate with other students and to ask questions in discussion sections and office hours to understand the problems However each student must turn in his own solution to the problems The problem sets also provide essential background material for the quizzes The problem sets will be graded primarily on an effort basis but if you do not work through the problem sets you are unlikely to succeed at the quizzes We will distribute solutions to the problem sets on the day the problem sets are due to give you feedback Homework assignments are due at the beginning of class on the due date Late homework will not be accepted Problem 2 1 Cache Access Time Performance This problem requires the knowledge of Handout 2 Cache Implementations and Lectures 6 7 Please read these materials before answering the following questions Ben is trying to determine the best cache configuration for a new processor He knows how to build two kinds of caches direct mapped caches and 4 way set associative caches The goal is to find the better cache configuration with the given building blocks He wants to know how these two different configurations affect the clock speed and the cache miss rate and choose the one that provides better performance in terms of average latency for a load Problem 2 1 A Access Time Direct Mapped Now we want to compute the access time of a direct mapped cache We use the implementation shown in Figure H2 A in Handout 2 Assume a 128 KB cache with 8word 32 byte cache lines The address is 32 bits and byte addressed so the two least significant bits of the address are ignored since a cache access is word aligned The data output is also 32 bits 1 word and the MUX selects one word out of the eight words in a cache line Using the delay equations given in Table 2 1 1 fill in the column for the direct mapped DM cache in the table In the equation for the data output driver associativity refers to the associativity of the cache 1 for direct mapped caches A for A way set associative caches Component Decoder Delay equation ps 200 of index bits 1000 Memory array 200 log2 of rows 200 log2 of bits in a row 1000 Comparator N to 1 MUX Buffer driver Data output driver Valid output driver 200 of tag bits 1000 500 log2 N 1000 2000 500 associativity 1000 1000 DM ps SA ps Tag Data Tag Data Table 2 1 1 Delay of each Cache Component What is the critical path of this direct mapped cache for a cache read What is the access time of the cache the delay of the critical path To compute the access time assume that a 2 input gate AND OR delay is 500 ps If the CPU clock is 150 MHz how many CPU cycles does a cache access take Problem 2 1 B Access Time Set Associative We also want to investigate the access time of a set associative cache using the 4 way setassociative cache in Figure H2 B in Handout 2 Assume the total cache size is still 128KB each way is 32 KB a 4 input gate delay is 1000 ps and all other parameters such as the input address cache line etc are the same as part 2 1 A Compute the delay of each component and fill in the column for a 4 way set associative cache in Table 2 1 1 Figure 2 1 4 way set associative cache What is the critical path of the 4 way set associative cache What is the access time of the cache the delay of the critical path What is the main reason that the 4 way set associative cache is slower than the direct mapped cache If the CPU clock is 150 MHz how many CPU cycles does a cache access take Problem 2 1 C Miss rate analysis Now Ben is studying the effect of set associativity on the cache performance Since he now knows the access time of each configuration he wants to know the miss rate of each one For the miss rate analysis Ben is considering two small caches a direct mapped cache with 8 lines with 16 bytes line and a 4 way set associative cache of the same size i e both caches are 128 bytes For the set associative cache Ben tries out two replacement policies least recently used LRU and round robin FIFO Ben tests the cache by accessing the following sequence of hexadecimal byte addresses starting with empty caches For simplicity assume that the addresses are only 12 bits Complete the following tables for the direct mapped cache and both types of 4 way setassociative caches showing the progression of cache contents as accesses occur in the tables inv invalid and the column of a particular cache line contains the tag index contents of that line Also for each address calculate the tag and index which should help in filling out the table You only need to fill in elements in the table when a value changes D map Address 110 136 202 1A3 102 361 204 114 1A4 177 301 206 135 line in cache hit tag index L0 L1 L2 L3 L4 L5 L6 L7 inv 11 inv inv inv inv inv inv no 13 no 20 no D map Total Misses Total Accesses 4 way line in cache Address Set 0 tag 110 136 202 1A3 102 361 204 114 1A4 177 301 206 135 index way0 inv way1 Inv Set 1 Way2 Inv way3 inv 20 4 way LRU Total Misses Total Accesses LRU hit way0 11 11 way1 inv 13 way2 inv way3 inv no no no 4 way FIFO line in cache Address hit Set 0 tag 110 136 202 1A3 102 361 204 114 1A4 177 301 206 135 index way0 inv way1 Inv way2 Inv Set 1 way3 inv way0 11 way1 inv 13 20 way2 inv way3 inv no no no 4 way FIFO Total Misses Total Accesses Problem 2 1 D Average Latency Assume that the results of the above analysis can represent the average miss rates of the direct mapped and the 4 way LRU 128 KB caches studied in 2 1 A and 2 1 B What would be the average memory access latency in CPU cycles for each cache assume that the cache miss penalty is 20 cycles Which one is better For the different replacement policies for the set associative cache which one has a smaller cache miss rate for the address stream in 2 1 C Explain why Is that replacement policy always going to yield better miss rates If not give a counter example using an address stream Problem 2 2 Pipelined Cache Access This problem requires the knowledge of Lecture 6 and …
View Full Document
Unlocking...