CMSC 411 Practice Exam 2 General instructions Be complete yet concise You may leave arithmetic expressions in any form that a calculator could evaluate 1 Cache a Suppose we have a byte addressable memory of size 4GB 2 32 bytes with a cache of size 256KB 218 bytes not including tag bits Also suppose the cache block size is 32 bytes For each of the following cache organizations compute the length in number of bits for the tag index and offset fields of the 32 bit memory address show your calculations i ii iii iv direct mapped 2 way set associative 8 way set associative fully associative b Given the following parameters which performs better in terms of average memory access time the direct mapped or the 2 way set associative cache Assume a cache hit takes 1 cycle for either organization and the cache miss penalty is 20 nanoseconds both use the same memory system Justify your answer Direct mapped miss rate 10 1GHz clock 2 way set associative miss rate 5 800MHz clock c Consider adding a fast second level cache to a computer with only a single level cache Suppose data can be accessed from the second level cache 12 times faster than it can be accessed from the original cache and that the second level cache can be used 30 of the time How much speedup can we gain by adding the second level cache Assume all data accesses hit in the original cache 2 Branch prediction Consider the following code fragment if d if d if d d d d 1 2 2 3 3 4 A typical code sequence generated for this fragment assuming that d is assigned to R1 looks like the following L1 L2 DSUBUI R2 R1 1 BNEZ R2 L1 Branch B1 DADDIU R1 R0 2 DSUBUI R3 R1 2 BNEZ R3 L2 Branch B2 DADDIU R1 R0 3 DSUBUI R4 R1 3 BNEZ R4 L3 Branch B3 DADDIU R1 R0 4 L3 For the given program fragment construct a 1 1 predictor table given that the code executes inside a loop with the value of d alternating between 2 and 0 Fill in the table as given below Circle the prediction bit used for each branch executed Assume that all predictors are initialized to not taken and that the correlation bit is initially set to not taken Finally calculate the total number of mispredicted branches d 2 0 2 0 B1 prediction B1 action New B1 prediction B2 prediction B2 action New B2 prediction B3 prediction B3 action New B3 prediction 3 Dynamic scheduling 25 points The following code implements the vector operation Y aX Y for a vector of length 100 and assumes that initially R1 0 and F0 contains the constant a foo L D MUL D L D ADD D S D DADDUI DADDUI DSGTUI BEQZ F2 0 R1 F4 F2 F0 F6 0 R2 F6 F4 F6 F6 0 R2 R1 R1 8 R2 R2 8 R3 R1 800 R3 foo load X i multiply a X i load Y i add a X i Y i store Y i increment X index increment Y index test if done loop if not done The DSGTUI instruction is an integer instruction that sets the result register R3 to a non zero value if the value in R1 is greater than 800 You are going to describe the execution of the loop for a single issue Tomasulo style MIPS pipeline with functional units as described in the following table FU type Integer FP adder FP multiplier Cycles in EX 1 4 15 of FUs 1 1 1 of reservation stations 5 3 2 Also assume the following Functional units are not pipelined No forwarding between functional units results can only come from the CDB The EX stage does both the effective address calculation and the memory access for loads and stores So the pipeline is IF ID IS EX WB Loads take 1 clock cycle The issue IS and write result WB stages each take 1 clock cycle There are 5 load buffer slots and 5 store buffer slots The BEQZ instruction takes 0 clock cycles Show the number of stall cycles for each instruction and what clock cycle each instruction begins execution i e enters its first EX cycle for two iterations of the loop Also show when each instruction writes the CDB store and branch instructions don t write the CDB Use the following table as a template Iteration number Instructions Issues at cycle Executes at cycle How many clock cycles does each iteration take Write CDB at Comment stalls
View Full Document