DOC PREVIEW
Berkeley COMPSCI 152 - Complex Pipelines

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1 CS152 Computer Architecture and Engineering Complex Pipelines Assigned March 9 Problem Set #3 Due March 18 http://inst.eecs.berkeley.edu/~cs152/sp10 The problem sets are intended to help you learn the material, and we encourage you to collaborate with other students and to ask questions in discussion sections and office hours to understand the problems. However, each student must turn in his own solution to the problems. The problem sets also provide essential background material for the quizzes. The problem sets will be graded primarily on an effort basis, but if you do not work through the problem sets you are unlikely to succeed at the quizzes! We will distribute solutions to the problem sets on the day the problem sets are due to give you feedback. Homework assignments are due at the beginning of class on the due date. Late homework will not be accepted.2 Problem 3.1: Superscalar Processor Consider the out-of-order, superscalar CPU shown in the diagram. It has the following features: o Four fully-pipelined functional units: ALU, MEM, FADD, FMUL o Instruction Fetch and Decode Unit that renames and sends 2 instructions per cycle to the ROB (assume perfect branch prediction and no cache misses) o An unbounded length Reorder Buffer that can perform the following operations on every cycle: o Accept two instructions from the Instruction Fetch and Decode Unit o Dispatch an instruction to each functional unit including Data Memory o Let Writeback update an unlimited number of entries o Commit up to 2 instructions in-order o There is no bypassing or short circuiting. For example, data entering the ROB cannot be passed on to the functional units or committed in the same cycle. . InstructionQueueROB(infinite )2 Instr per cycleALUFAD D+DataMemFMULRegfileIssue as many as possibleWriteback as many as possibleCommit at most 2 instr per cycle3 Now consider the execution of the following program on this machine using: I1 loop: LD F2, 0(R2) I2 LD F3, 0(R3) I3 FMUL F4, F2, F3 I4 LD F2, 4(R2) I5 LD F3, 4(R3) I6 FMUL F5, F2, F3 I7 FMUL F6, F4, F5 I8 FADD F4, F4, F5 I9 FMUL F6, F4, F5 I10 FADD F1, F1, F6 I11 ADD R2, R2, 8 I12 ADD R3, R3, 8 I13 ADD R4, R4, -1 I14 BNEZ R4, loop Problem 3.1.A Fill in the renaming tags in the following two tables for the execution of instructions I1 to I10. Tags should not be reused. Instr # Instruction Dest Src1 Src2 I1 LD F2, 0(R2) T1 R2 0 I2 LD F3, 0(R3) T2 R3 0 I3 FMUL F4, F2, F3 I4 LD F2, 4(R2) R2 4 I5 LD F3, 4(R3) R3 4 I6 FMUL F5, F2, F3 I7 FMUL F6, F4, F5 I8 FADD F4, F4, F5 I9 FMUL F6, F4, F5 I10 FADD F1, F1, F6 F1 Renaming table I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 R2 R3 F1 F2 T1 F3 T2 F4 F5 F64 Problem 3.1.B Consider the execution of one iteration of the loop (I1 to I14). In the following diagram draw the data dependencies between the instructions after register renaming Problem 4.1.C The attached table is a data structure to record the times when some activity takes place in the ROB. For example, one column records the time when an instruction enters ROB, while the last two columns record, respectively, the time when an instruction is dispatched to the FU’s and the time when results are written back to the ROB. This data structure has been designed to test your understanding of how a Superscalar machine functions. Fill in the blanks in last two columns up to slot T13 (You may use the source columns for book keeping – no credit will be taken off for the wrong entries there).5 Argument 1 Argument 2 dst Slot Instruction Cycle instruction entered ROB src1 cycle available Src2 cycle available dst reg Cycle dispatched Cycle written back to ROB T1 LD F2, 0(R2) 1 C 1 R2 1 F2 2 6 T2 LD F3, 0(R3) 1 C 1 R3 1 F3 3 7 T3 FMUL F4, F2, F3 2 F3 7 F4 T4 LD F2, 4(R2) 2 C 2 R2 F2 T5 LD F3, 4(R3) 3 C 3 R3 F3 T6 FMUL F5, F2, F3 3 F5 T7 FMUL F6, F4, F5 4 F6 T8 FADD F4, F4, F5 4 F4 T9 FMUL F6, F4, F5 5 F6 T10 FADD F1, F1, F6 5 F1 T11 ADD R2, R2, 8 6 R2 6 C 6 R2 T12 ADD R3, R3, 8 6 R3 6 C 6 R3 T13 ADD R4, R4, -1 7 R4 7 C 7 R4 T14 BNEZ R4, loop 7 C Loop T15 LD F2, 0(R2) 8 C 8 F2 10 14 T16 LD F3, 0(R3) 8 C 8 F3 11 15 T17 FMUL F4, F2, F3 9 F4 T18 LD F2, 4(R2) 9 C 9 F2 T19 LD F3, 4(R3) 10 C 10 F3 T20 FMUL F5, F2, F3 10 F5 T21 FMUL F6, F4, F5 11 F6 T22 FADD F4, F4, F5 11 F4 T23 FMUL F6, F4, F5 12 F6 T24 FADD F1, F1, F6 12 F1 T25 ADD R2, R2, 8 13 C 13 R2 T26 ADD R3, R3, 8 13 C 13 R3 T27 ADD R4, R4, -1 14 C 14 R4 T28 BNEZ R4, loop 14 C Loop6 Problem 3.1.D Identify the instructions along the longest latency path in completing this iteration of the loop (up to instruction 13). Suppose we consider an instruction to have executed when its result is available in the ROB. How many cycles does this iteration take to execute? Problem 3.1.E Do you expect the same behavior, i.e., the same dependencies and the same number of cycles, for the next iteration? (You may use the slots from T15 onwards in the attached diagram for bookkeeping to answer this question). Please give a simple reason why the behavior may repeat, or identify a resource bottleneck or dependency that may preclude the repetition of the behavior. Problem 3.1.F Can you improve the performance by adding at most one additional memory port and a FP Multiplier? Explain briefly. Yes / No Problem 3.1.G What is the minimum number of cycles needed to execute a typical iteration of this loop if we keep the same latencies for all the units but are allowed to use as many FUs and memory ports and are allowed to fetch and commit as many instructions as we want.7 Problem 3.2: Register Renaming and Static vs. Dynamic Scheduling The following MIPS code calculates the floating-point expression E = A * B + C * D, where the addresses of A, B, C, D, and E are stored in R1, R2, R3, R4, and R5, respectively: L.S F0, 0(R1) L.S F1, 0(R2) MUL.S F0, F0, F1 L.S F2, 0(R3) L.S F3, 0(R4) MUL.S F2, F2, F3 ADD.S F0, F0, F2 S.S F0, 0(R5) Problem 3.2.A Simple Pipeline Calculate the number of cycles this code sequence would take to execute (i.e., the number of cycles between the issue of the first load instruction and the issue of the final store, inclusive) on a simple in-order pipelined machine that


View Full Document

Berkeley COMPSCI 152 - Complex Pipelines

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Complex Pipelines
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Complex Pipelines and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Complex Pipelines 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?