Andrew login ID:Full Name:CS 15-213, Fall 2005Final ExamFriday Dec 16, 2005• Make sure that your exam is not missing any sheets, then write your full name and Andrew login IDon the front.• Write your answers in the space provided below the problem. If you make a mess, clearly indicateyour final answer.• The exam has a maximum score of 92 points.• This exam is OPEN BOOK. You may use any books or notes you like. You may use a calculator, butno other electronic devices are allowed. Good luck!01 (06):02 (12):03 (08):04 (10):05 (12):06 (12):07 (12):08 (08):09 (08):10 (04):TOTAL (92):Page 1 of 17Problem 1. (6 points):Address spaces. Suppose you have a computer system with:• A 1 GB byte-addressable virtual address space,• A 256 MB byte-addressable physical address space, and• A virtual memory page size of 4 KB.A. What is the minimum number of address bits needed to represent the virtual address space? __________.B. What is the minumum number of bits needed to represent the physical address space? __________C. What is the total number of page table entries? __________(Express your answer in the form 2x).Page 2 of 17Problem 2. (12 points):Data representation. Consider the following two 9-bit floating point representations based on the IEEEfloating point format.1. Format A• There is one sign bit.• There are k = 5 exponent bits. The exponent bias is 15.• There are n = 3 fraction bits.2. Format B• There is one sign bit.• There are k = 4 exponent bits. The exponent bias is 7.• There are n = 4 fraction bits.Numeric values are encoded in both of these formats as a value of the form V = (−1)S× M × 2E, whereS is the sign bit, E is exponent after biasing, and M is the significand value. The fraction bits encode thesignificand value M using either a denormalized (exponent field 0) or a normalized representation (exponentfield nonzero).Below, you are given some bit patterns in Format A, and your task is to convert them to the closest value inFormat B. If rounding is necessary you should round toward +∞. In addition, give the values of numbersgiven by the Format A and Format B bit patterns. Give these as whole numbers (e.g., 17) or as fractions(e.g., 17/64 or 17/26).Format A Format BBits Value Bits Value1 01111 001 −9/8 1 0111 0010 −9/80 10110 0111 00111 0101 11100 0000 10111 100Page 3 of 17Problem 3. (8 points):Array indexing. Consider the source code below, where M and N are constants declared with #define.int array1[M][N];int array2[N][M];int copy(int i, int j){array1[i][j] = array2[j][i];}The above code generates the following assembly code on a 64-bit Pentium:Arguments: i is in %edi, j is in %esicopy:movslq %esi,%rsimovslq %edi,%rdileaq 0(,%rsi,8), %raxleaq (%rdi,%rdi,4), %rdxsubq %rsi, %raxaddq %rsi, %rdxaddq %rdi, %raxmovl array2(,%rax,4), %eaxmovl %eax, array1(,%rdx,4)retAssuming that sizeof(int) == 4, what are the values of M and N?M =N =Page 4 of 17Problem 4. (10 points):Machine-level code. Consider the following function’s assembly code:00000000004004f8 <foo>:4004f8: 53 push %rbx4004f9: 89 f8 mov %edi,%eax4004fb: 83 ff 01 cmp $0x1,%edi4004fe: 76 21 jbe 400521 <foo+0x29>400500: b8 01 00 00 00 mov $0x1,%eax400505: b9 00 00 00 00 mov $0x0,%ecx40050a: ba 02 00 00 00 mov $0x2,%edx40050f: 39 fa cmp %edi,%edx400511: 77 0e ja 400521 <foo+0x29>400513: 01 c8 add %ecx,%eax400515: 89 c3 mov %eax,%ebx400517: 29 cb sub %ecx,%ebx400519: 89 d9 mov %ebx,%ecx40051b: ff c2 inc %edx40051d: 39 fa cmp %edi,%edx40051f: 76 f2 jbe 400513 <foo+0x1b>400521: 5b pop %rbx400522: c3 retqPlease fill in the corresponding C code:int foo (unsigned int x){int a, b, i;if(__________)_____________;a = 1;b = 0;for(_______ ; ________ ; _______){a = ____________;b = ____________;}return _____;}Page 5 of 17Problem 5. (12 points):Performance Evaluation. We saw in class how loop unrolling can be used to improve the performance of apiece of code. This problem will test your ability to analyze the performance improvements offered by thistechnique.Assume that multiplication has a latency of 7 cycles and addition has a latency of 5 cycles.A. Alice has written the code below to compute the dot product of two vectors, computing one element periteration.data_t dot_prod(data_t A[], data_t B[], int size){data_t result = 0;int i;for (i = size-1; i >= 0; i--) {result = result + (A[i] * B[i]);}return result;}What is the optimal CPE achieved by the code above? Assume that there are an unlimited number offunctional units.CPE = _____________Page 6 of 17B. Now suppose Alice unrolls the loop, computing two elements per iteration. What is the resulting optimalCPE? Once again, assume that there are an unlimited number of functional units./* Unroll 2x */data_t dot_product2(data_t A[], data_t B[], int size){data_t result = 0;int i;/* Unroll by 2X */for (i = size-1; i >= 1; i -= 2) {data_t t1 = A[i] * B[i];data_t t2 = A[i-1] * B[i-1];result = result + (t1 + t2);}/* Finish off remaining element(s) */for (; i >= 0; i -= 1) {result = result + (A[i] * B[i]);}return result;}CPE = _____________C. By what factor would Alice need to unroll the loop to get an optimal CPE of 0.5? Once again, assumethat there are an unlimited number of functional units.Unroll by _____________D. One of the reasons why we get diminishing returns from unrolling on a real CPU is that there are a limitednumber of functional units, and therefore, a limit to how many operations we can perform in parallel. SoAlice is very excited to learn that the CS department is getting a new machine with one million floatingpoint units. In anticipation, she unrolls her dot product code by 10,000. Give one reason why her new codemay actually perform worse than her old code (which unrolled by 2).You may assume that memory latencies and cache sizes on the new machine are the same as on the old one._________________________________________________________________Page 7 of 17Problem 6. (12 points):Cache memories. This problem requires you to analyze both high-level and low-level aspects of caches. Youwill be required to perform part of a a cache translation, determine individual hits and misses, and analyzeoverall cache performance.• Memory is byte addressable• Physical addesses are 14 bits wide• The cache is direct-mapped with a 16 byte block-size and 4 sets• sizeof(int) = 4 bytesA. The following question will deal with a 5 × 5 int matrix arr[5][5]. Assume that the array has alreadybeen initialized.(a) The box below shows the format of a physical address.
View Full Document