Pipelining IVImplementing Pipeline ControlInitial Version of Pipeline ControlControl CombinationsControl Combination AStall in FJXX + ret works great!Control Combination BHandling Control Combination BCorrected Pipeline Control LogicLoad/use hazard with retPipeline SummaryPerformance Analysis with PipeliningCPI for PIPEComputing CPIComputing CPI - IIComputing CPI - IIICPI for PIPE (Cont.)SummaryPipelining IVPipelining IVTopicsTopicsImplementing pipeline controlPipelining and performance analysisSystems I2Implementing Pipeline ControlCombinational logic generates pipeline control signalsAction occurs at start of following cycleEMWFDCCCCrBsrcAsrcBicode valE valM dstE dstMBchicode valE valA dstE dstMicode ifun valC valA valB dstE dstM srcA srcBvalC valPicode ifun rApredPCd_srcBd_srcAe_BchD_icodeE_icodeM_icodeE_dstMPipecontrollogicD_bubbleD_stallE_bubbleF_stall3Initial Version of Pipeline Controlbool F_stall =# Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } ||# Stalling at fetch while ret passes through pipelineIRET in { D_icode, E_icode, M_icode };bool D_stall = # Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Bubble for ret IRET in { D_icode, E_icode, M_icode };bool E_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};4Control CombinationsSpecial cases that can arise on same clock cycleCombination ACombination ANot-taken branch ret instruction at branch targetCombination BCombination BInstruction that reads from memory to %espFollowed by ret instructionLoadEUseDMLoad/useJXXEDMMispredictJXXEDMMispredictEretDMret 1retEbubbleDMret 2bubbleEbubbleDretMret 3EretDMret 1EretDMret 1retEbubbleDMret 2retEbubbleDMret 2bubbleEbubbleDretMret 3bubbleEbubbleDretMret 3Combination BCombination A5Control Combination AShould handle as mispredicted branchStalls F pipeline registerBut PC selection logic will be using M_valM anyhowJXXEDMMispredictJXXEDMMispredictEretDMret1EretDMret1EretDMret1Combination ACondition F D E M WProcessing ret stall bubble normal normal normalMispredicted Branch normal bubble bubble normal normalCombination stall bubble bubble normal normal6Stall in FYour book provides two inconsistent meanings for “stall in F”Instruction remains in F and injects a bubble into DInstruction squashed into D, same PC fetchedFigure 4.61Use the one that keeps 1 instr per pipeline stageE M WDFF D E M WF D E M W7JXX + ret works great!0x000: xorl %eax,%eax1 2 3 4 5 6 7 8 9F D E M WF D E M W0x002: jne target # Not takenF D E M WF D E M WE M W100x011: t: ret # Targetbubble0x012: nop # Target + 1F DE M WDFbubble0x007: irmovl $1,%eax # Fall through0x00d: nopF D E M WF D E M WF D E M WF D E M W8Control Combination BWould attempt to bubble and stall pipeline register DSignaled by processor as pipeline errorLoadEUseDMLoad/useEretDMret1EretDMret1EretDMret1Combination BCondition F D E M WProcessing ret stall bubble normal normal normalLoad/Use Hazard stall stall bubble normal normalCombination stall bubble + stallbubble normal normal9Handling Control Combination BLoad/use hazard should get priority ret instruction should be held in decode stage for additional cycleLoadEUseDMLoad/useEretDMret1EretDMret1EretDMret1Combination BCondition F D E M WProcessing ret stall bubble normal normal normalLoad/Use Hazard stall stall bubble normal normalCombination stall stall bubble normal normal10Corrected Pipeline Control LogicLoad/use hazard should get priority ret instruction should be held in decode stage for additional cycleConditionFFDDEEMMWWProcessing retstallstallbubblebubblenormalnormalnormalnormalnormalnormalLoad/Use HazardstallstallstallstallbubblebubblenormalnormalnormalnormalCombinationstallstallstallstallbubblebubblenormalnormalnormalnormalbool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB });11Load/use hazard with retmrmovl F Dret Fmrmovl F D Eret F Daddl Fmrmovl F D E M bubble Eret F D Daddl F Fmrmovl F D E M W bubble E Mret F D D Eaddl F F bubble Daddl F12Pipeline SummaryData HazardsMost handled by forwardingNo performance penaltyLoad/use hazard requires one cycle stallControl HazardsCancel instructions when detect mispredicted branchTwo clock cycles wastedStall fetch stage while ret passes through pipelineThree clock cycles wastedControl CombinationsMust analyze carefullyFirst version had subtle bugOnly arises with unusual instruction combination13Performance Analysis with PipeliningIdeal pipelined machine: CPI = 1Ideal pipelined machine: CPI = 1One instruction completed per cycleBut much faster cycle time than unpipelined machineHowever - hazards are working against the idealHowever - hazards are working against the idealHazards resolved using forwarding are fineStalling degrades performance and instruction comletion rate is interruptedCPI is measure of “architectural efficiency” of designCPI is measure of “architectural efficiency” of designCycleSecondsnInstructioCyclesProgramnsInstructioProgramSeconds timeCPU 14CPI for PIPECPI CPI 1.0 1.0Fetch instruction each clock cycleEffectively process new instruction almost every cycleAlthough each individual instruction has latency of 5 cyclesCPI CPI >> 1.0 1.0Sometimes must stall or cancel branchesComputing CPIComputing CPIC clock cyclesI instructions executed to completionB bubbles injected (C = I + B)CPI = C/I = (I+B)/I = 1.0 + B/IFactor B/I represents average penalty due to bubbles15Computing CPICPICPIFunction of useful instruction and bubblesCb/Ci represents the pipeline penalty due to stallsCan reformulate to account forCan reformulate to account forload penalties (lp)branch misprediction penalties (mp)return penalties (rp)CPI Ci CbCi1.0 CbCiCPI 1.0 lp mp rp16Computing CPI - IISo how do we determine the penalties?So how do we determine the penalties?Depends on how often each situation occurs on
View Full Document