Unformatted text preview:

Pipelining - IIRevisiting Pipelining LessonsRevisiting Pipelining HazardsControl Signals for existing DatapathPlace registers between each stepExampleSlide 7Slide 8Slide 9Slide 10Slide 11Slide 12Pipelining Load InstructionPipelining the R InstructionPipelining Both L and R typeImportant ObservationsSolutionDatapath (Without Pipeline)Slide 19Structural Hazard and SolutionControl Hazard - #1 StallControl Hazard – #2 PredictControl Hazard - #3 Delayed BranchData Hazards (RAW)Data Hazards [contd…]Slide 26Hazard DetectionThree Generic Data HazardsSlide 29Slide 30Slide 31Computing CPISummaryPipelining - IIAdapted from CS 152C (UC Berkeley) lectures notes of Spring 2002Revisiting Pipelining Lessons•Pipelining doesn’t help latency of single task, it helps throughput of entire workload•Pipeline rate limited by slowest pipeline stage•Multiple tasks operating simultaneously using different resources•Potential speedup = Number pipe stages•Unbalanced lengths of pipe stages reduces speedup•Time to “fill” pipeline and time to “drain” it reduces speedup•Stall for DependencesABCD6 PM7 8 9TaskOrderTime30 40 40 40 40 20•Structural Hazards–Hardware design•Control Hazard–Decision based on results•Data Hazard–Data DependencyRevisiting Pipelining HazardsControl Signals for existing DatapathThe Right to Left Control can lead to hazardsPlace registers between each stepExample10 lw r1, r2(35)14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12100 and r13, r14, 15Start: Fetch 10ExecReg. FileMemAccessDataMemABSRegFileIRInst. MemDDecodeMemCtrlWB CtrlMrs rtim10 lw r1, r2(35)14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12100 and r13, r14, 15IFPCNext PC10=n n n nFetch 14, Decode 10ExecReg. FileMemAccessDataMemABSRegFileIRInst. MemDDecodeMemCtrlWB CtrlM2 rtim10 lw r1, r2(35)14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12100 and r13, r14, 15lw r1, r2(35)IDIFPCNext PC14=n n nFetch 20, Decode 14, Exec 10ExecReg. FileMemAccessDataMemr2BSRegFileIRInst. MemDDecodeMemCtrlWB CtrlM2 rt3510 lw r1, r2(35)14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12100 and r13, r14, 15lw r1addI r2, r2, 3EXPCNext PC20=nnFetch 24, Decode 20, Exec 14, Mem 10ExecReg. FileMemAccessDataMemr2Br2+35RegFileIRInst. MemDDecodeMemCtrlWB CtrlM4 5310 lw r1, r2(35)14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12100 and r13, r14, 15lw r1sub r3, r4, r5addI r2, r2, 3IDIFEXM PCNext PC24=nFetch 30, Dcd 24, Ex 20, Mem 14, WB 10ExecReg. FileMemAccessDataMemr4r5r2+3RegFileIRInst. MemDDecodeMemCtrlWB CtrlM[r2+35]6 710 lw r1, r2(35)14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12100 and r13, r14, 15lw r1beq r6, r7 100addI r2sub r3IDIFEXM WB PCNext PC30=Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14ExecReg. FileMemAccessDataMemr6r7r2+3RegFileIRInst. MemDDecodeMemCtrlWB Ctrlr1=M[r2+35]9 xx10 lw r1, r2(35)14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12100 and r13, r14, 15beqaddI r2sub r3r4-r5100ori r8, r9 17IDIFEXM WB PCNext PC100=Pipelining Load Instruction•The five independent functional units in the pipeline datapath are:–Instruction Memory for the Ifetch stage–Register File’s Read ports (bus A and busB) for the Reg/Dec stage–ALU for the Exec stage–Data Memory for the Mem stage–Register File’s Write port (bus W) for the Wr stageClockCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7Ifetch Reg/Dec Exec Mem Wr1st lwIfetch Reg/Dec Exec Mem Wr2nd lwIfetch Reg/Dec Exec Mem Wr3rd lwPipelining the R Instruction•Ifetch: Instruction Fetch–Fetch the instruction from the Instruction Memory•Reg/Dec: Registers Fetch and Instruction Decode•Exec: –ALU operates on the two register operands–Update PC•Wr: Write the ALU output back to the register fileCycle 1 Cycle 2 Cycle 3 Cycle 4Ifetch Reg/Dec Exec WrR-typePipelining Both L and R type•We have pipeline conflict or structural hazard:–Two instructions try to write to the register file at the same time!–Only one write portCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Ifetch Reg/Dec Exec WrR-typeIfetch Reg/Dec Exec WrR-typeIfetch Reg/Dec Exec Mem WrLoadIfetch Reg/Dec Exec WrR-typeIfetch Reg/Dec Exec WrR-typeOps! We have a problem!Important Observations•Each functional unit can only be used once per instruction•Each functional unit must be used at the same stage for all instructions:–Load uses Register File’s Write Port during its 5th stage–R-type uses Register File’s Write Port during its 4th stageIfetch Reg/Dec Exec Mem WrLoad1 2 3 4 5Ifetch Reg/Dec Exec WrR-type1 2 3 4Solution•Delay R-type’s register write by one cycle:–Now R-type instructions also use Reg File’s write port at Stage 5–Mem stage is a NOOP stage: nothing is being done.Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Ifetch Reg/Dec Mem WrIfetch Reg/Dec Mem WrR-typeIfetch Reg/Dec Exec Mem WrLoadIfetch Reg/Dec Mem WrR-typeIfetch Reg/Dec Mem WrR-typeIfetch Reg/DecExecWrR-typeMemExecExecExecExec1 2 345Datapath (Without Pipeline)IR <- Mem[PC]; PC <– PC+4;A <- R[rs]; B<– R[rt]S <– A + B;R[rd] <– S;S <– A + SX;M <– Mem[S]R[rd] <– M;S <– A or ZX;R[rt] <– S;S <– A + SX;Mem[S] <- BIf CondPC < PC+SX;ExecReg. FileMemAccessDataMemABSRegFileEqualPCNext PCIRInst. MemDMDatapath (With Pipeline)IR <- Mem[PC]; PC <– PC+4;A <- R[rs]; B<– R[rt]S <– A + B;R[rd] <– M;S <– A + SX;M <– Mem[S]R[rd] <– M;S <– A or ZX;R[rt] <– M;S <– A + SX;Mem[S] <- Bif Cond PC < PC+SX;M <– SExecReg. FileMemAccessDataMemABSRegFileEqualPCNext PCIRInst. MemDMM <– SMemStructural Hazard and SolutionInstr.OrderTime (clock cycles)LoadInstr 1Instr 2Instr 3Instr 4ALUMemRegMem RegALUMemRegMem RegALUMemRegMem RegALURegMem RegALUMemRegMem RegControl Hazard - #1 Stall•Stall: wait until decision is clear•Impact: 2 lost cycles (i.e. 3 clock cycles per branch instruction) => slowInstr.OrderTime (clock cycles)AddBeqLoadALUMemRegMem RegALUMemRegMem RegALURegMem RegMemLostpotentialControl Hazard – #2 Predict•Predict: guess one direction then back up if wrong•Impact: 0 lost cycles per branch instruction if right, 1 if wrong (right 50% of time)•More dynamic scheme: history of 1 branchInstr.OrderTime (clock


View Full Document

TAMU CSCE 350 - Pipelining II

Documents in this Course
slide13

slide13

46 pages

slide11

slide11

21 pages

slide15

slide15

23 pages

Load more
Download Pipelining II
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pipelining II and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pipelining II 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?