Andrew login ID:Full Name:CS 15-213, Fall 2002Exam 2November 12, 2002Instructions:Make sure that your exam is not missing any sheets, then write your full name and Andrew login IDon the front.Write your answers in the space provided below the problem. If you make a mess, clearly indicateyour final answer.The exam has a maximum score of 66 points.The problems are of varying difficulty. The point value of each problem is indicated. Pile up the easypoints quickly and then come back to the harder problems.This exam is OPEN BOOK. You may use any books or notes you like. You may use a calculator, butno laptops or other wireless devices. Good luck!1 (09):2 (08):3 (08):4 (12):5 (10):6 (10):7 (09):TOTAL (66):Page 1 of 12Problem 1. (9 points):This problem tests your understanding of code optimization. Consider the following function for computingthe product of an array ofintegers. We have unrolled the loop by a factor of 4.int aprod (int a[], int n){int i, w, x, y, z, r=1;for (i = 0; i < n-3; i += 4) {w = a[i]; x = a[i+1]; y = a[i+2]; z = a[i+3];r = r * w * x * y * z; // Product computation}for (; i < n; i++)r *= a[i];return r;}For the line labeled Product computation, we can use parentheses to create 3 different associationsof the computation, as follows:r = (((r * w) * x) * y) * z; // A1r = (r * w) * ((x * y) * z); // A2r = r * (w * (x * (y * z))); // A3Complete the following table with the theoretical CPE (cycles per element) of each of these associations.Assume that this machine has an infinite number of integer multipliers, all capable of operating in parallelwith each other. Also, assume that integer multiplication on this machine has a latency of 4 cycles and anissue time of 1 cycle.Version Theoretical CPEA1A2A3Here are some hints:Recall that the CPE measure assumes that the run time, measured in clock cycles, for an array oflengthis a function of the form , whereis the CPE.“Theoretical CPE” means the performance that would be achieved if the only limiting factors werethe data dependences of computation and the latency and issue time of the integer multiplier.Page 2 of 12Problem 2. (8 points):The following problem concerns basic cache lookups.The memory is byte addressable.Memory accesses are to 1-byte words (not 4-byte words).Physical addresses are 14 bits wide.The cache is 4-way set associative, with a 4-byte block size and 64 total lines.In the following tables, all numbers are given in hexadecimal. The Index column contains the set indexfor each set of 4 lines. The Tag columns contain the tag value for each line. The V column contains thevalid bit for each line. The Bytes 0–3 columns contain the data for each line, numbered left-to-right startingwith byte 0 on the left.The contents of the cache are as follows:4-way Set Associative CacheIndex Tag V Bytes 0–3 Tag V Bytes 0–3 Tag V Bytes 0–3 Tag V Bytes 0–30 0C 0 03 3E CD 38 A0 0 16 7B ED 5A 40 0 8E 4C DF 18 58 0 FB B7 12 021 3A 1 A9 76 2B EE 54 0 BC 91 D5 92 98 1 80 BA 9B F6 84 1 48 16 81 0A2 26 0 75 F7 3F C6 78 1 9E 3A 0F DA 26 1 00 4C B6 A8 5E 1 92 04 E5 2E3 B8 1 E0 22 19 3A D2 0 02 B3 8F B6 D4 1 25 31 E1 02 C2 0 18 09 73 024 54 1 86 B8 F0 C6 4C 1 AA 29 AE 16 56 1 76 46 80 6E 1C 1 13 EA A8 665 F6 0 04 2A 32 6A 9E 0 B1 86 56 0E CC 0 96 30 47 F2 06 1 F8 1D 42 306 BE 0 2F 7E 3D A8 C0 0 27 95 A4 74 C4 1 07 11 6B D8 8A 1 C7 B7 AF C27 A0 0 D6 A4 89 92 10 0 FD FE D6 DA 76 0 DE D5 CD 4A E2 0 7C 68 3A 1A8 F0 1 ED 32 0A A2 E4 1 BF 80 1D FC 14 1 EF 09 86 2A BC 1 25 44 6F 1A9 30 1 1E C2 AE 60 08 0 5C 3E DF F2 CA 0 25 CF 84 DA 5C 1 F1 6B DC DEA 38 1 5D 4D F7 DA 82 1 69 C2 8C 74 9C 1 A8 CE 7F DA 3E 1 FA 93 EB 48B 3A 1 61 C6 5E 74 64 0 03 97 BA 62 80 1 F8 11 72 12 E0 1 C5 EC 76 4EC D4 0 17 52 75 2C AE 0 62 89 EF 18 8E 0 BB 7D 8C 7C 68 0 26 57 7F C2D DC 1 54 9E 1E FA B6 1 DC 81 B2 14 00 0 B6 1F 7B 44 74 0 10 F5 B8 2EE D6 0 14 9A 0D 4A EA 1 C8 1D E6 6E 38 1 F3 38 F3 5C 64 0 6C 8F BD A8F 7E 1 32 21 1C 2C FA 1 22 C2 DC 34 BE 1 BA DD 37 D8 B8 0 E7 A2 39 BAPart 1The box below shows the format of a physical address. Indicate (by labeling the diagram) the fields thatwould be used to determine the following:CO The block offset within the cache lineCI The cache indexCT The cache tag13 12 11 10 9 8 7 6 5 4 3 2 1 0Page 3 of 12Part 2For the given physical address, indicate the cache entry accessed and the cache byte value returned in hex.Indicate whether a cache miss occurs.If there is a cache miss, enter “-” for “Cache Byte returned”.Physical address: 2BB2A. Physical address format (one bit per box)13 12 11 10 9 8 7 6 5 4 3 2 1 0B. Physical memory referenceParameter ValueCache Offset (CO) 0xCache Index (CI) 0xCache Tag (CT) 0xCache Hit? (Y/N)Cache Byte returned 0xPhysical address: 098BA. Physical address format (one bit per box)13 12 11 10 9 8 7 6 5 4 3 2 1 0B. Physical memory referenceParameter ValueCache Offset (CO) 0xCache Index (CI) 0xCache Tag (CT) 0xCache Hit? (Y/N)Cache Byte returned 0xPage 4 of 12Problem 3. (8 points):This problem tests your understanding of cache conflict misses. Consider the following matrix transposeroutinetypedef int array[2][2];void transpose(array dst, array src) {int i, j;for (j = 0; j < 2; j++) {for (i = 0; i < 2; i++) {dst[i][j] = src[j][i];}}}running on a hypothetical machine with the following properties:sizeof(int) == 4.The src array starts at address 0 and the dst array starts at address 16 (decimal).There is a single L1 cache that is direct mapped and write-allocate, with a block size of 8 bytes.Accesses to the src and dst arrays are the only sources of read and write misses, respectively.A. Suppose the cache has a total size of 16 data bytes (i.e., the block size times the number of sets is16 bytes) and that the cache is initially empty. Then for each row and col, indicate whether eachaccess to src[row][col] and dst[row][col] is a hit (h) or a …
View Full Document