Unformatted text preview:

2/15/2006 136G22.2243-001High Performance Computer ArchitectureLecture 5Control Hazards (count’d)ScoreboardingFebruary 15, 20062/15/2006 137Outline• Announcements– Lab Assignment 1 due back today– HW Assignment 2 out today, due back in one week: February 22nd– Lab Assignment 2 out today, due back in two weeks: March 1st• Reviewing schemes for dealing with control hazards• Reducing the impact of data hazards– Scoreboarding[ Hennessy/Patterson CA:AQA (3rd Edition): Appendix A, Chapter 3]2/15/2006 138Recap• Keep the pipeline full; avoid bubbles• Deal with data hazards• Deal with Control Hazard2/15/2006 139Control Hazard (cont’d)StaticGlobalBTB (Branch Target Buffer) can be combined with prediction cacheTournoment predictors adaptively choose between local and global predictorsuse saturating counters as selectorCorrelating predictors (n,m)Consider past n branches (2npossibilities); for each possibility us m bits for predictionN-bit predictors (2-bit predictors; a few K entries)RAS (Return Address Stack) (8-16 entries)Branch takenBranch not takenBranch delay slot(s)AdaptiveLocalDynamicPREDICTIONLOOKUP2/15/2006 140Advanced Methods for Dealing with Data Hazards2/15/2006 141Dealing with Data Hazards• Data hazards result in pipeline stalls– Because instructions need to wait for their results to become available– Also affects subsequent instructions that do not need these results•So far:– Register and result forwarding– Relied on in-order execution of instructions– Use static (compiler) scheduling to reduce the number of stalls•Now:Dynamic SchedulingHardware rearranges the instruction execution to reduce stalls, whilemaintaining data flow and exception behavior• To see the impact of these schemes let us look at a more advanced pipelined ISA with multi-cycle operations2/15/2006 142Extending the RISC Pipeline to Handle Multicycle Operations• Classic RISC pipeline assumes that all operations complete in 1 cycle– Impractical because of clock-period and logic considerations• More practical is a model where the (original) EX stage is split across multiple functional units, each of which may or may not be pipelinedNot pipelined2/15/2006 143Instruction Flow through Longer Latency Pipelines• Standard pipeline model extended with additional stagesRed: Data needed Blue: Results available• Increased stalls because of RAW hazards• Also increases stalls due to other hazards– Structural: Initiation intervals > 1, More than one register write per cycle– WAW hazards because different instructions take different number of cycles; WAR does not happenWBMEMEXIDIFS.DWBMEMEXIDIFL.DWBMEMA4A3A2A1IDIFADD.DWBMEMM7M6M5M4M3M2M1IDIFMUL.DMEM17MEM16A415A314EXA2WB13IDA1MEM121110987654321IFS.D F2, 0(R2)IDIFADD.D F2. F0, F8M7M6M5M4M3M2M1IDIFMUL.D F0, F4, F6WBMEMEXIDIFL.D F4, 0(R2)2/15/2006 144Dynamic Scheduling• Hardware rearranges the instruction execution to reduce stalls, whilemaintaining data flow and exception behavior• Idea: Instruction can execute as soon as data dependencies are satisfied– Separation of instruction issue, execution, and commit stages • Classic pipeline: these correspond to IF and ID, EX and MEM, and WB• ID stage split into two– Issue: Decode instruction, check for structural hazards– Read Operands: Wait until no data hazards, then read operands–Variants • In-order issue, out-of-order execution, in-order commit• Out-of-order issue, out-of-order execution, in-order commit• Out-of-order issue, out-of-order execution, out-of-order commit2/15/2006 145Major Dynamic Scheduling ImplementationsTwo implementations• Scoreboarding (Appendix A)• Tomasulo’s algorithm (Chapter 3)– A variation of this is used in current-day microprocessors2/15/2006 146Scoreboarding• Named after the CDC 6600 scoreboard– Used to orchestrate parallel instruction execution among functional units• 4 floating-point, 5 memory-reference, 7 integer• We will illustrate the technique using a simpler RISC machine– 2 FP multiply (10 EX cycles), 1 FP add (2 EX), 1 FP divide (40 EX)– 1 integer unit (1 EX)• Scoreboard: Keeps track of instruction and machine status, permitting the instruction to execute as soon as its operands are available– Instructions can execute out-of-order– Have to deal with all three kinds of data hazards• RAW hazard: can happen as in the classic pipeline• WAW hazard: can happen as in pipelines with multi-cycle operations• WAR hazard: can happen because of out-of-order execution– Our scoreboard does not take advantage of forwarding, since it waits until results are written back to the register file; We emphasis on FP operationsFour Stages of Scoreboard Control• Issue (ID1)– decode instructions– check for structural and WAW hazards; example:– stall until structural and WAW hazards are resolved; no further issues• Read operands (ID2) – wait until no RAW hazards (i.e., no earlier issued active instruction is going to write to source operands of this instruction)– then read operands• Execution (EX) – operate on operands– may be multiple cycles - notify scoreboard when done• Write result (WB)– finish execution– stall if WAR hazard; example:DIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F8, F8, F14DIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F10, F8, F14Three Parts of the Scoreboard• Instruction status– Indicates which of 4 steps the instruction is in: ID1, ID2, EX, or WB.• Functional unit status: Indicates the state of each functional unit (FU)– Busy Indicates whether the unit is busy or not–Op Operation to perform in the unit (e.g., + or –)–FiDestination register–Fj, FkSource-register numbers–Qj, QkFunctional units producing source registers Fj, Fk–Rj, RkFlags indicating when Fj, Fkare ready• Register result status– Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register2/15/2006 149Scoreboard ExampleADD.D F6, F8, F2DIV.D F10, F0, F6SUB.D F8, F6, F2MUL.D F0, F2, F4L.D F2, 45(R3)L.D F6, 34(R2)WriteExecuteRead Ops.IssueInstructionRkRjQkQjFkOpNoDivideNoAddNoMult2NoMult1NoIntegerFjFiBusyNameTimeF16F12F10F8F6F4F2F0InstructionFunctionalUnitRegisterNo ForwardingLoads (L.D) performed by “Integer”ADDs and SUBs performed by “Add”2/15/2006 150Scoreboard Example: Cycle 1ADD.D F6, F8, F2DIV.D F10, F0, F6SUB.D F8, F6, F2MUL.D F0, F2, F4L.D F2, 45(R3)1L.D F6, 34(R2)WriteExecuteRead


View Full Document

NYU CSCI-GA 2243 - Control Hazards

Download Control Hazards
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Control Hazards and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Control Hazards 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?