Andrew login ID Full Name CS 15 213 Fall 2005 Final Exam Friday Dec 16 2005 Make sure that your exam is not missing any sheets then write your full name and Andrew login ID on the front Write your answers in the space provided below the problem If you make a mess clearly indicate your final answer The exam has a maximum score of 92 points This exam is OPEN BOOK You may use any books or notes you like You may use a calculator but no other electronic devices are allowed Good luck 01 06 02 12 03 08 04 10 05 12 06 12 07 12 08 08 09 08 10 04 TOTAL 92 Page 1 of 17 Problem 1 6 points Address spaces Suppose you have a computer system with A 1 GB byte addressable virtual address space A 256 MB byte addressable physical address space and A virtual memory page size of 4 KB A What is the minimum number of address bits needed to represent the virtual address space B What is the minumum number of bits needed to represent the physical address space C What is the total number of page table entries Express your answer in the form 2x Page 2 of 17 Problem 2 12 points Data representation Consider the following two 9 bit floating point representations based on the IEEE floating point format 1 Format A There is one sign bit There are k 5 exponent bits The exponent bias is 15 There are n 3 fraction bits 2 Format B There is one sign bit There are k 4 exponent bits The exponent bias is 7 There are n 4 fraction bits Numeric values are encoded in both of these formats as a value of the form V 1 S M 2E where S is the sign bit E is exponent after biasing and M is the significand value The fraction bits encode the significand value M using either a denormalized exponent field 0 or a normalized representation exponent field nonzero Below you are given some bit patterns in Format A and your task is to convert them to the closest value in Format B If rounding is necessary you should round toward In addition give the values of numbers given by the Format A and Format B bit patterns Give these as whole numbers e g 17 or as fractions e g 17 64 or 17 26 Format A Format B Bits Value Bits Value 1 01111 001 9 8 1 0111 0010 9 8 0 10110 011 1 00111 010 1 11100 000 0 10111 100 Page 3 of 17 Problem 3 8 points Array indexing Consider the source code below where M and N are constants declared with define int array1 M N int array2 N M int copy int i int j array1 i j array2 j i The above code generates the following assembly code on a 64 bit Pentium Arguments i is in edi j is in esi copy movslq movslq leaq leaq subq addq addq movl movl ret esi rsi edi rdi 0 rsi 8 rax rdi rdi 4 rdx rsi rax rsi rdx rdi rax array2 rax 4 eax eax array1 rdx 4 Assuming that sizeof int 4 what are the values of M and N M N Page 4 of 17 Problem 4 10 points Machine level code Consider the following function s assembly code 00000000004004f8 foo 4004f8 53 4004f9 89 f8 4004fb 83 ff 01 4004fe 76 21 400500 b8 01 00 00 00 400505 b9 00 00 00 00 40050a ba 02 00 00 00 40050f 39 fa 400511 77 0e 400513 01 c8 400515 89 c3 400517 29 cb 400519 89 d9 40051b ff c2 40051d 39 fa 40051f 76 f2 400521 5b 400522 c3 push mov cmp jbe mov mov mov cmp ja add mov sub mov inc cmp jbe pop retq rbx edi eax 0x1 edi 400521 foo 0x29 0x1 eax 0x0 ecx 0x2 edx edi edx 400521 foo 0x29 ecx eax eax ebx ecx ebx ebx ecx edx edi edx 400513 foo 0x1b rbx Please fill in the corresponding C code int foo unsigned int x int a b i if a 1 b 0 for a b return Page 5 of 17 Problem 5 12 points Performance Evaluation We saw in class how loop unrolling can be used to improve the performance of a piece of code This problem will test your ability to analyze the performance improvements offered by this technique Assume that multiplication has a latency of 7 cycles and addition has a latency of 5 cycles A Alice has written the code below to compute the dot product of two vectors computing one element per iteration data t dot prod data t A data t B int size data t result 0 int i for i size 1 i 0 i result result A i B i return result What is the optimal CPE achieved by the code above Assume that there are an unlimited number of functional units CPE Page 6 of 17 B Now suppose Alice unrolls the loop computing two elements per iteration What is the resulting optimal CPE Once again assume that there are an unlimited number of functional units Unroll 2x data t dot product2 data t A data t B int size data t result 0 int i Unroll by 2X for i size 1 i 1 i 2 data t t1 A i B i data t t2 A i 1 B i 1 result result t1 t2 Finish off remaining element s for i 0 i 1 result result A i B i return result CPE C By what factor would Alice need to unroll the loop to get an optimal CPE of 0 5 Once again assume that there are an unlimited number of functional units Unroll by D One of the reasons why we get diminishing returns from unrolling on a real CPU is that there are a limited number of functional units and therefore a limit to how many operations we can perform in parallel So Alice is very excited to learn that the CS department is getting a new machine with one million floating point units In anticipation she unrolls her dot product code by 10 000 Give one reason why her new code may actually perform worse than her old code which unrolled by 2 You may assume that memory latencies and cache sizes on the new machine are the same as on the old one Page 7 of 17 Problem 6 12 points Cache memories This problem requires you to analyze both high level and low level aspects of caches You will be required to perform part of a a cache translation determine individual hits and misses and analyze overall cache performance Memory is byte addressable Physical addesses are 14 bits wide The cache is direct mapped with a 16 byte block size and 4 sets sizeof int 4 bytes A The following question will deal with a 5 5 int matrix arr 5 5 Assume that the array has already been initialized a The box …
View Full Document