UMD CMSC 411 - Practice Exam 2 - D2346585

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 411> Practice Exam 2

DOC PREVIEW

UMD CMSC 411 - Practice Exam 2

School name University of Maryland, College Park

Course Cmsc 411- Computer Systems Architecture

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CMSC 411 Practice Exam 2General instructions.Be complete, yet concise. You may leave arithmetic expressions in any form that a calculator could evaluate.1. Cache(a) Suppose we have a byte addressable memory of size 4GB(232bytes) with a cache of size 256KB(218bytes), not including tag bits. Also suppose the cache block size is 32 bytes. For each ofthe following cache organizations, compute the length in number of bits for the tag, index andoffset fields of the 32-bit memory address (show your calculations):i. direct mappedii. 2-way set associativeiii. 8-way set associativeiv. fully associative(b) Given the following parameters, which performs better in terms of average memory access time,the direct mapped or the 2-way set associative cache? Assume a cache hit takes 1 cycle for eitherorganization, and the cache miss penalty is 20 nanoseconds (both use the same memory system).Justify your answer.Direct mapped - miss rate 10%, 1GHz clock2-way set associative - miss rate 5%, 800MHz clock(c) Consider adding a fast second-level cache to a computer with only a single level cache. Supposedata can be accessed from the second level cache 12 times faster than it can be accessed fromthe original cache, and that the second level cache can be used 30% of the time. How muchspeedup can we gain by adding the second-level cache? Assume all data accesses hit in theoriginal cache.2. Branch predictionConsider the following code fragment:if( d == 1 )d = 2;if( d == 2 )d = 3;if( d == 3)d = 4;A typical code sequence generated for this fragment, assuming that d is assigned to R1, looks like thefollowing:DSUBUI R2, R1, #1BNEZ R2, L1 ; Branch B1DADDIU R1, R0, #2L1: DSUBUI R3, R1, #2BNEZ R3, L2 ; Branch B2DADDIU R1, R0, #3L2: DSUBUI R4, R1, #3BNEZ R4, L3 ; Branch B3DADDIU R1, R0, #4. . .L3:For the given program fragment, construct a (1, 1) predictor table, given that the code executes insidea loop, with the value of d alternating between 2 and 0. Fill in the table as given below. Circle theprediction bit used for each branch executed. Assume that all predictors are initialized to not taken,and that the correlation bit is initially set to not taken.Finally, calculate the total number of mispredicted branches.d=? B1 B1 New B1 B2 B2 New B2 B3 B3 New B3prediction action prediction prediction action prediction prediction action prediction20203. Dynamic scheduling (25 points)The following code implements the vector operation Y = aX + Y for a vector of length 100, andassumes that initially R1=0 and F0 contains the constant a:foo: L.D F2,0(R1) ;load X(i)MUL.D F4,F2,F0 ;multiply a*X(i)L.D F6,0(R2) ;load Y(i)ADD.D F6,F4,F6 ;add a*X(i)+Y(i)S.D F6,0(R2) ;store Y(i)DADDUI R1,R1,#8 ;increment X indexDADDUI R2,R2,#8 ;increment Y indexDSGTUI R3,R1,#800 ;test if doneBEQZ R3,foo ;loop if not doneThe DSGTUI instruction is an integer instruction that sets the result register, R3, to a non-zero valueif the value in R1 is greater than 800.You are going to describe the execution of the loop for a single-issue Tomasulo-style MIPS pipeline,with functional units as described in the following table:FU type Cycles in EX # of FUs # of reservation stationsInteger 1 1 5FP adder 4 1 3FP multiplier 15 1 2Also assume the following:• Functional units are not pipelined.• No forwarding between functional units; results can only come from the CDB.• The EX stage does both the effective address calculation and the memory access for loads andstores. So the pipeline is IF/ID/IS/EX/WB.• Loads take 1 clock cycle.• The issue(IS) and write result (WB) stages each take 1 clock cycle.• There are 5 load buffer slots and 5 store buffer slots.• The BEQZ instruction takes 0 clock cycles.Show the number of stall cycles for each instruction and what clock cycle each instruction beginsexecution (i.e., enters its first EX cycle), for two iterations of the loop. Also show when each instruc-tion writes the CDB (store and branch instructions don’t write the CDB). Use the following table as atemplate:Iteration Instructions Issues at Executes at Write CDB Commentnumber cycle # cycle at (stalls)How many clock cycles does each iteration

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

UMD CMSC 411 - Practice Exam 2

Sign up for free to view:

Please select your school