DOC PREVIEW
Berkeley COMPSCI 152 - Complex Pipelines Problem Set 3

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Problem 3.1: Superscalar ProcessorCalculate the effect of running the original code on a single-issue machine with register renaming and out-of-order issue. Ignore structural hazards apart from the single instruction decode per cycle. Show how the code is executed and give the number of cycles required. Compare it with results from optimized execution in 3.2.B.Now calculate the effect of running code you wrote in 3.2.C on the single-issue machine with register renaming and out-of-order issue from 3.3.D. Compare the number of cycles required to execute the program. What are the differences in the program and/or architecture that change the number of cycles required to execute the program? You should assume that all load instructions before a store must issue before the store is issued, and load instructions after a store must wait for the store to issue.Problem 3.3: Branch PredictionCS152Computer Architecture and EngineeringComplex PipelinesAssigned March 9Problem Set #3Due March 18http://inst.eecs.berkeley.edu/~cs152/sp10The problem sets are intended to help you learn the material, and we encourage you tocollaborate with other students and to ask questions in discussion sections and office hours tounderstand the problems. However, each student must turn in his own solution to the problems.The problem sets also provide essential background material for the quizzes. The problem setswill be graded primarily on an effort basis, but if you do not work through the problem sets youare unlikely to succeed at the quizzes! We will distribute solutions to the problem sets on the daythe problem sets are due to give you feedback. Homework assignments are due at the beginningof class on the due date. Late homework will not be accepted.1Problem 3.1: Superscalar Processor Consider the out-of-order, superscalar CPU shown in the diagram. It has the following features:o Four fully-pipelined functional units: ALU, MEM, FADD, FMULo Instruction Fetch and Decode Unit that renames and sends 2 instructions per cycle to theROB (assume perfect branch prediction and no cache misses)o An unbounded length Reorder Buffer that can perform the following operations on everycycle:o Accept two instructions from the Instruction Fetch and Decode Unito Dispatch an instruction to each functional unit including Data Memoryo Let Writeback update an unlimited number of entrieso Commit up to 2 instructions in-ordero There is no bypassing or short circuiting. For example, data entering the ROB cannot bepassed on to the functional units or committed in the same cycle..InstructionQueueROB(infinite )2 Instr per cycleALUFADD+DataMemFMULRegfileIssue as many as possibleWriteback as many as possibleCommit at most 2 instr per cycle2Now consider the execution of the following program on this machine using:I1 loop: LD F2, 0(R2)I2 LD F3, 0(R3)I3 FMUL F4, F2, F3I4 LD F2, 4(R2)I5 LD F3, 4(R3)I6 FMUL F5, F2, F3I7 FMUL F6, F4, F5I8 FADD F4, F4, F5I9 FMUL F6, F4, F5I10 FADD F1, F1, F6I11 ADD R2, R2, 8I12 ADD R3, R3, 8I13 ADD R4, R4, -1I14 BNEZ R4, loopProblem 3.1.AFill in the renaming tags in the following two tables for the execution of instructions I1 to I10.Tags should not be reused.Instr # Instruction Dest Src1 Src2I1 LD F2, 0(R2) T1 R2 0I2 LD F3, 0(R3) T2 R3 0I3 FMUL F4, F2, F3I4 LD F2, 4(R2) R2 4I5 LD F3, 4(R3) R3 4I6 FMUL F5, F2, F3I7 FMUL F6, F4, F5I8 FADD F4, F4, F5I9 FMUL F6, F4, F5I10 FADD F1, F1, F6 F1Renaming tableI1 I2 I3 I4 I5 I6 I7 I8 I9 I10R2R3F1F2 T1F3 T2F4F5F63Problem 3.1.BConsider the execution of one iteration of the loop (I1 to I14). In the following diagram draw thedata dependencies between the instructions after register renamingProblem 4.1.CThe attached table is a data structure to record the times when some activity takes place in theROB. For example, one column records the time when an instruction enters ROB, while the lasttwo columns record, respectively, the time when an instruction is dispatched to the FU’s and thetime when results are written back to the ROB. This data structure has been designed to test yourunderstanding of how a Superscalar machine functions. Fill in the blanks in last two columns up to slot T13 (You may use the source columns for bookkeeping – no credit will be taken off for the wrong entries there). 4Slot InstructionCycleinstructionenteredROBArgument 1 Argument 2 dst CycledispatchedCyclewrittenback toROBsrc1 cycleavailableSrc2 cycleavailabledst regT1LD F2, 0(R2)1 C 1 R2 1 F2 2 6T2LD F3, 0(R3)1 C 1 R3 1 F3 3 7T3FMUL F4, F2, F32 F3 7 F4T4LD F2, 4(R2)2 C 2 R2 F2T5LD F3, 4(R3)3 C 3 R3 F3T6FMUL F5, F2, F33 F5T7FMUL F6, F4, F54 F6T8FADD F4, F4, F54 F4T9FMUL F6, F4, F55 F6T10FADD F1, F1, F65 F1T11ADD R2, R2, 86 R2 6 C 6 R2T12ADD R3, R3, 86 R3 6 C 6 R3T13ADD R4, R4, -17 R4 7 C 7 R4T14BNEZ R4, loop7 C LoopT15LD F2, 0(R2)8 C 8 F2 10 14T16LD F3, 0(R3)8 C 8 F3 11 15T17FMUL F4, F2, F39 F4T18LD F2, 4(R2)9 C 9 F2T19LD F3, 4(R3)10 C 10 F3T20FMUL F5, F2, F310 F5T21FMUL F6, F4, F511 F6T22FADD F4, F4, F511 F4T23FMUL F6, F4, F512 F6T24FADD F1, F1, F612 F1T25ADD R2, R2, 813 C 13 R25T26ADD R3, R3, 813 C 13 R3T27ADD R4, R4, -114 C 14 R4T28BNEZ R4, loop14 C Loop6Problem 3.1.DIdentify the instructions along the longest latency path in completing this iteration of theloop (up to instruction 13). Suppose we consider an instruction to have executed when itsresult is available in the ROB. How many cycles does this iteration take to execute? Problem 3.1.EDo you expect the same behavior, i.e., the same dependencies and the same number ofcycles, for the next iteration? (You may use the slots from T15 onwards in the attacheddiagram for bookkeeping to answer this question). Please give a simple reason why thebehavior may repeat, or identify a resource bottleneck or dependency that may precludethe repetition of the behavior.Problem 3.1.FCan you improve the performance by adding at most one additional memory port and aFP Multiplier? Explain briefly.Yes / No Problem 3.1.GWhat is the minimum number of cycles needed to execute a typical iteration of this loopif we keep the same latencies for all the units but are allowed to use as many FUs andmemory ports and are allowed to fetch and commit as many instructions as we want.7Problem 3.2: Register Renaming and Static vs. Dynamic Scheduling The following MIPS code calculates the floating-point expression E = A * B + C * D,where the addresses of A, B, C, D, and E are stored in R1, R2, R3, R4, and


View Full Document

Berkeley COMPSCI 152 - Complex Pipelines Problem Set 3

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Complex Pipelines Problem Set 3
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Complex Pipelines Problem Set 3 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Complex Pipelines Problem Set 3 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?