DOC PREVIEW
Berkeley COMPSCI 152 - Complex Pipelines

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS152Computer Architecture and EngineeringComplex PipelinesAssigned March 7Problem Set #3Due March 16http://inst.eecs.berkeley.edu/~cs152/sp11The problem sets are intended to help you learn the material, and we encourage you to collaborate with other students and to ask questions in discussion sections and office hours to understand the problems. However, each student must turn in his own solution to the problems.The problem sets also provide essential background material for the quizzes. The problem sets will be graded primarily on an effort basis, but if you do not work through the problem sets you are unlikely to succeed at the quizzes! We will distribute solutions to the problem sets on the day the problem sets are due to give you feedback. Homework assignments are due at the beginning of class on the due date. Late homework will not be accepted.2Problem 3.1: Superscalar Processor Consider the out-of-order, superscalar CPU shown in the diagram. It has the following features:oFour fully-pipelined functional units: ALU, MEM, FADD, FMULoInstruction Fetch and Decode Unit that renames and sends 2 instructions per cycle to the ROB (assume perfect branch prediction and no cache misses)oAn unbounded length Reorder Buffer that can perform the following operations on every cycle:oAccept two instructions from the Instruction Fetch and Decode UnitoDispatch an instruction to each functional unit including Data MemoryoLet Writeback update an unlimited number of entriesoCommit up to 2 instructions in-orderoThere is no bypassing or short circuiting. For example, data entering the ROB cannot be passed on to the functional units or committed in the same cycle..3Now consider the execution of the following program on this machine using:I1loop:LD F2, 0(R2)I2LD F3, 0(R3)I3FMUL F4, F2, F3I4LD F2, 4(R2)I5LD F3, 4(R3)I6FMUL F5, F2, F3I7FMUL F6, F4, F5I8FADD F4, F4, F5I9FMUL F6, F4, F5I10FADD F1, F1, F6I11ADD R2, R2, 8I12ADD R3, R3, 8I13ADD R4, R4, -1I14BNEZ R4, loopProblem 3.1.AFill in the renaming tags in the following two tables for the execution of instructions I1 to I10. Tags should not be reused.Instr #InstructionDestSrc1Src2I1LD F2, 0(R2)T1R20I2LD F3, 0(R3)T2R30I3FMUL F4, F2, F3I4LD F2, 4(R2)R24I5LD F3, 4(R3)R34I6FMUL F5, F2, F3I7FMUL F6, F4, F5I8FADD F4, F4, F5I9FMUL F6, F4, F5I10FADD F1, F1, F6F1Renaming tableI1I2I3I4I5I6I7I8I9I10R2R3F1F2T1F3T2F4F5F64Problem 3.1.BConsider the execution of one iteration of the loop (I1 to I14). In the following diagram draw the data dependencies between the instructions after register renamingProblem 4.1.CThe attached table is a data structure to record the times when some activity takes place in the ROB. For example, one column records the time when an instruction enters ROB, while the last two columns record, respectively, the time when an instruction is dispatched to the FU’s and the time when results are written back to the ROB. This data structure has been designed to test your understanding of how a Superscalar machine functions. Fill in the blanks in last two columns up to slot T13 (You may use the source columns for book keeping – no credit will be taken off for the wrong entries there).5SlotInstructionCycle instruction entered ROBArgument 1Argument 1Argument 2Argument 2dstCycle dispatchedCycle written back to ROBSlotInstructionCycle instruction entered ROBsrc1cycle availableSrc2cycle availabledst regCycle dispatchedCycle written back to ROBT1LD F2, 0(R2)1C1R21F226T2LD F3, 0(R3)1C1R31F337T3FMUL F4, F2, F32F37F4T4LD F2, 4(R2)2C2R2F2T5LD F3, 4(R3)3C3R3F3T6FMUL F5, F2, F33F5T7FMUL F6, F4, F54F6T8FADD F4, F4, F54F4T9FMUL F6, F4, F55F6T10FADD F1, F1, F65F1T11ADD R2, R2, 86R26C6R2T12ADD R3, R3, 86R36C6R3T13ADD R4, R4, -17R47C7R4T14BNEZ R4, loop7CLoopT15LD F2, 0(R2)8C8F21014T16LD F3, 0(R3)8C8F31115T17FMUL F4, F2, F39F4T18LD F2, 4(R2)9C9F2T19LD F3, 4(R3)10C10F3T20FMUL F5, F2, F310F5T21FMUL F6, F4, F511F6T22FADD F4, F4, F511F4T23FMUL F6, F4, F512F6T24FADD F1, F1, F612F1T25ADD R2, R2, 813C13R2T26ADD R3, R3, 813C13R3T27ADD R4, R4, -114C14R4T28BNEZ R4, loop14C Loop6Problem 3.1.DIdentify the instructions along the longest latency path in completing this iteration of the loop (up to instruction 13). Suppose we consider an instruction to have executed when its result is available in the ROB. How many cycles does this iteration take to execute? Problem 3.1.EDo you expect the same behavior, i.e., the same dependencies and the same number of cycles, for the next iteration? (You may use the slots from T15 onwards in the attached diagram for bookkeeping to answer this question). Please give a simple reason why the behavior may repeat, or identify a resource bottleneck or dependency that may preclude the repetition of the behavior.Problem 3.1.FCan you improve the performance by adding at most one additional memory port and a FP Multiplier? Explain briefly.Yes / No Problem 3.1.GWhat is the minimum number of cycles needed to execute a typical iteration of this loop if we keep the same latencies for all the units but are allowed to use as many FUs and memory ports and are allowed to fetch and commit as many instructions as we want.7 Problem 3.2: Register Renaming and Static vs. Dynamic Scheduling The following MIPS code calculates the floating-point expression E = A * B + C * D, where the addresses of A, B, C, D, and E are stored in R1, R2, R3, R4, and R5, respectively:L.S F0, 0(R1)L.S F1, 0(R2)MUL.S F0, F0, F1L.S F2, 0(R3)L.S F3, 0(R4)MUL.S F2, F2, F3ADD.S F0, F0, F2S.S F0, 0(R5)Problem 3.2.ASimple PipelineCalculate the number of cycles this code sequence would take to execute (i.e., the number of cycles between the issue of the first load instruction and the issue of the final store, inclusive) on a simple in-order pipelined machine that has no bypassing. The datapath includes a load/store unit, a floating-point adder, and a floating-point multiplier. Assume that loads have a two-cycle latency, floating-point multiplication has a four-cycle latency and floating-point addition has a two-cycle latency. Write-back for floating-point registers takes one cycle. Also assume that all functional units are fully pipelined and ignore any write back conflicts. Give the number of cycles between the issue of the first load instruction and the issue of the final store, inclusive.Problem 3.2.BStatic SchedulingReorder the


View Full Document

Berkeley COMPSCI 152 - Complex Pipelines

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Complex Pipelines
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Complex Pipelines and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Complex Pipelines 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?