Pipelining IISEQ+ HardwareAdding Pipeline RegistersPipeline StagesPIPE- HardwareSignal Naming ConventionsFeedback PathsPipeline DemonstrationData Dependencies: 3 Nop’sData Dependencies: 2 Nop’sData Dependencies: 1 NopData Dependencies: No NopPredicting the PCOur Prediction StrategyRecovering from PC MispredictionBranch Misprediction ExampleBranch Misprediction TraceReturn ExampleIncorrect Return ExamplePipeline SummaryThe problem is hazardsHow do we fix the Pipeline?Stalling for Data DependenciesStall ConditionDetecting Stall ConditionStalling X3What Happens When Stalling?Implementing StallingPipeline Register ModesSummaryPipelining IITopicsTopicsPipelining hardware: registers and feedback pathsDifficulties with pipelines: hazardsMethod of mitigating hazardsSystems I2InstructionmemoryInstructionmemoryPCincrementPCincrementCCCCALUALUDatamemoryDatamemoryPCrBdstE dstMALUAALUBMem.controlAddrsrcA srcBreadwriteALUfun.FetchDecodeExecuteMemoryWrite backdata outRegisterfileRegisterfileA BMERegisterfileRegisterfileA BMEBchdstE dstM srcA srcBicode ifun rApBch pValM pValC pValPpIcodePCvalC valPvalBvalADatavalEvalMPCSEQ+ HardwareStill sequential implementationReorder PC stage to put at beginningPC StagePC StageTask is to select PC for current instructionBased on results computed by previous instructionProcessor StateProcessor StatePC is no longer stored in registerBut, can determine PC based on other stored information3InstructionmemoryInstructionmemoryPCincrementPCincrementCCCCALUALUDatamemoryDatamemoryFetchDecodeExecuteMemoryWrite backicode, ifunrA, rBvalCRegisterfileRegisterfileA BMERegisterfileRegisterfileA BMEpStatevalPsrcA, srcBdstA, dstBvalA, valBaluA, aluBBchvalEAddr, DatavalMPCvalE, valMvalMicode, valCvalPPCAdding Pipeline RegistersPCincrementPCincrementCCCCALUALUDatamemoryDatamemoryFetchDecodeExecuteMemoryWrite backRegisterfileRegisterfileA BMERegisterfileRegisterfileA BMEvalPd_srcA, d_srcBvalA, valBaluA, aluBBch valEAddr, DatavalMPCW_valE, W_valM, W_dstE, W_dstMW_icode, W_valMicode, ifun,rA, rB, valCEMWFDvalPf_PCpredPCInstructionmemoryInstructionmemoryM_icode, M_Bch, M_valA4Pipeline StagesFetchFetchSelect current PCRead instructionCompute incremented PCDecodeDecodeRead program registersExecuteExecuteOperate ALUMemoryMemoryRead or write data memoryWrite BackWrite BackUpdate register file5PIPE- HardwarePipeline registers hold intermediate values from instruction executionForward (Upward) PathsForward (Upward) PathsValues passed from one stage to nextCannot jump past stagese.g., valC passes through decodeEMWFDInstructionmemoryInstructionmemoryPCincrementPCincrementRegisterfileRegisterfileALUALUDatamemoryDatamemorySelectPCrBdstE dstMSelectAALUAALUBMem.controlAddrsrcA srcBreadwriteALUfun.FetchDecodeExecuteMemoryWrite backicodedata outdata inA BMEM_valAW_valMW_valEM_valAW_valMd_rvalAf_PCPredictPCvalE valM dstE dstMBchicode valE valA dstE dstMicode ifun valC valA valB dstE dstM srcA srcBvalC valPicode ifun rApredPCCCCCd_srcBd_srcAe_BchM_Bch6Signal Naming ConventionsS_FieldS_FieldValue of Field held in stage S pipeline registers_Fields_FieldValue of Field computed in stage S7Feedback PathsPredicted PCPredicted PCGuess value of next PCBranch informationBranch informationJump taken/not-takenFall-through or target addressReturn pointReturn pointRead from memoryRegister updatesRegister updatesTo register file write portsEMWFDInstructionmemoryInstructionmemoryPCincrementPCincrementRegisterfileRegisterfileALUALUDatamemoryDatamemorySelectPCrBdstE dstMSelectAALUAALUBMem.controlAddrsrcA srcBreadwriteALUfun.FetchDecodeExecuteMemoryWrite backicodedata outdata inA BMEM_valAW_valMW_valEM_valAW_valMd_rvalAf_PCPredictPCvalE valM dstE dstMBchicode valE valA dstE dstMicode ifun valC valA valB dstE dstM srcA srcBvalC valPicode ifun rApredPCCCCCd_srcBd_srcAe_BchM_Bch8Pipeline DemonstrationFile: File: demo-basic.ysdemo-basic.ysirmovl $1,%eax #I11 2 3 4 5 6 7 8 9F D E MWirmovl $2,%ecx #I2F D E MWirmovl $3,%edx #I3F D E M Wirmovl $4,%ebx #I4F D E M Whalt #I5F D E M WCycle 5WI1MI2EI3DI4FI59Data Dependencies: 3 Nop’s0x000: irmovl $10,%edx1 2 3 4 5 6 7 8 9F D E M WF D E M W0x006: irmovl $3,%eaxF D E M WF D E M W0x00c: nopF D E M WF D E M W0x00d: nopF D E M WF D E M W0x00e: nopF D E M WF D E M W0x00f: addl %edx,%eaxF D E M WF D E M W10WR[%eax] 3WR[%eax] 3DvalAR[%edx] = 10valBR[%eax] = 3DvalAR[%edx] = 10valBR[%eax] = 3# demo-h3.ysCycle 6110x011: haltF D E M WF D E M WCycle 710Data Dependencies: 2 Nop’s0x000: irmovl $10,%edx1 2 3 4 5 6 7 8 9F D E M WF D E M W0x006: irmovl $3,%eaxF D E M WF D E M W0x00c: nopF D E M WF D E M W0x00d: nopF D E M WF D E M W0x00e: addl %edx,%eaxF D E M WF D E M W0x010: haltF D E M WF D E M W10# demo-h2.ysWR[%eax] 3DvalAR[%edx] = 10valBR[%eax] = 0•••WR[%eax] 3WR[%eax] 3DvalAR[%edx] = 10valBR[%eax] = 0DvalAR[%edx] = 10valBR[%eax] = 0•••Cycle 6ErrorCan’t transport value produced by first instruction back in time11Data Dependencies: 1 Nop0x000: irmovl $10,%edx1 2 3 4 5 6 7 8 9F D E MW0x006: irmovl $3,%eaxF D E MW0x00c: nopF D E M WF D E M W0x00d: addl %edx,%eaxF D E M WF D E M W0x00f: haltF D E M WF D E M W# demo-h1.ysWR[%edx] 10WR[%edx] 10DvalAR[%edx] = 0valBR[%eax] = 0DvalAR[%edx] = 0valBR[%eax] = 0•••Cycle 5ErrorMM_valE = 3M_dstE = %eaxNow a problem with both operands12Data Dependencies: No Nop0x000: irmovl $10,%edx1 2 3 4 5 6 7 8F D E MW0x006: irmovl $3,%eaxF D E MWF D E M W0x00c: addl %edx,%eaxF D E M W0x00e: halt# demo-h0.ysEDvalAR[%edx] = 0valBR[%eax] = 0DvalAR[%edx] = 0valBR[%eax] = 0Cycle 4ErrorMM_valE = 10M_dstE = %edxe_valE0 + 3 = 3 E_dstE = %eaxWow - we really missed the boat here…13Predicting the PCStart fetch of new instruction after current one has completed fetch stageNot enough time to reliably determine next instructionGuess which instruction will followRecover if prediction was incorrectFDrBM_icodePredictPCvalC valPicode ifun rAInstructionmemoryInstructionmemoryPCincrementPCincrementpredPCNeedregidsNeedvalCInstrvalidAlignAlignSplitSplitBytes 1-5Byte 0SelectPCM_BchM_valAW_icodeW_valM14Our Prediction StrategyInstructions that Don’t Transfer ControlInstructions that Don’t Transfer ControlPredict next PC to be valPAlways reliableCall and Unconditional JumpsCall and Unconditional JumpsPredict next PC to be valC (destination)Always
View Full Document