DOC PREVIEW
UT CS 429H - Pipelining IV

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Pipelining IVImplementing Pipeline ControlInitial Version of Pipeline ControlControl CombinationsControl Combination AStall in FJXX + ret works great!Control Combination BHandling Control Combination BCorrected Pipeline Control LogicLoad/use hazard with retPipeline SummaryPerformance Analysis with PipeliningCPI for PIPEComputing CPIComputing CPI - IIComputing CPI - IIICPI for PIPE (Cont.)SummaryPipelining IVPipelining IVTopicsTopicsImplementing pipeline controlPipelining and performance analysisSystems I2Implementing Pipeline ControlCombinational logic generates pipeline control signalsAction occurs at start of following cycleEMWFDCCCCrBsrcAsrcBicode valE valM dstE dstMBchicode valE valA dstE dstMicode ifun valC valA valB dstE dstM srcA srcBvalC valPicode ifun rApredPCd_srcBd_srcAe_BchD_icodeE_icodeM_icodeE_dstMPipecontrollogicD_bubbleD_stallE_bubbleF_stall3Initial Version of Pipeline Controlbool F_stall =# Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } ||# Stalling at fetch while ret passes through pipelineIRET in { D_icode, E_icode, M_icode };bool D_stall = # Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Bubble for ret IRET in { D_icode, E_icode, M_icode };bool E_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};4Control CombinationsSpecial cases that can arise on same clock cycleCombination ACombination ANot-taken branch ret instruction at branch targetCombination BCombination BInstruction that reads from memory to %espFollowed by ret instructionLoadEUseDMLoad/useJXXEDMMispredictJXXEDMMispredictEretDMret 1retEbubbleDMret 2bubbleEbubbleDretMret 3EretDMret 1EretDMret 1retEbubbleDMret 2retEbubbleDMret 2bubbleEbubbleDretMret 3bubbleEbubbleDretMret 3Combination BCombination A5Control Combination AShould handle as mispredicted branchStalls F pipeline registerBut PC selection logic will be using M_valM anyhowJXXEDMMispredictJXXEDMMispredictEretDMret1EretDMret1EretDMret1Combination ACondition F D E M WProcessing ret stall bubble normal normal normalMispredicted Branch normal bubble bubble normal normalCombination stall bubble bubble normal normal6Stall in FYour book provides two inconsistent meanings for “stall in F”Instruction remains in F and injects a bubble into DInstruction squashed into D, same PC fetchedFigure 4.61Use the one that keeps 1 instr per pipeline stageE M WDFF D E M WF D E M W7JXX + ret works great!0x000: xorl %eax,%eax1 2 3 4 5 6 7 8 9F D E M WF D E M W0x002: jne target # Not takenF D E M WF D E M WE M W100x011: t: ret # Targetbubble0x012: nop # Target + 1F DE M WDFbubble0x007: irmovl $1,%eax # Fall through0x00d: nopF D E M WF D E M WF D E M WF D E M W8Control Combination BWould attempt to bubble and stall pipeline register DSignaled by processor as pipeline errorLoadEUseDMLoad/useEretDMret1EretDMret1EretDMret1Combination BCondition F D E M WProcessing ret stall bubble normal normal normalLoad/Use Hazard stall stall bubble normal normalCombination stall bubble + stallbubble normal normal9Handling Control Combination BLoad/use hazard should get priority ret instruction should be held in decode stage for additional cycleLoadEUseDMLoad/useEretDMret1EretDMret1EretDMret1Combination BCondition F D E M WProcessing ret stall bubble normal normal normalLoad/Use Hazard stall stall bubble normal normalCombination stall stall bubble normal normal10Corrected Pipeline Control LogicLoad/use hazard should get priority ret instruction should be held in decode stage for additional cycleConditionFFDDEEMMWWProcessing retstallstallbubblebubblenormalnormalnormalnormalnormalnormalLoad/Use HazardstallstallstallstallbubblebubblenormalnormalnormalnormalCombinationstallstallstallstallbubblebubblenormalnormalnormalnormalbool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB });11Load/use hazard with retmrmovl F Dret Fmrmovl F D Eret F Daddl Fmrmovl F D E M bubble Eret F D Daddl F Fmrmovl F D E M W bubble E Mret F D D Eaddl F F bubble Daddl F12Pipeline SummaryData HazardsMost handled by forwardingNo performance penaltyLoad/use hazard requires one cycle stallControl HazardsCancel instructions when detect mispredicted branchTwo clock cycles wastedStall fetch stage while ret passes through pipelineThree clock cycles wastedControl CombinationsMust analyze carefullyFirst version had subtle bugOnly arises with unusual instruction combination13Performance Analysis with PipeliningIdeal pipelined machine: CPI = 1Ideal pipelined machine: CPI = 1One instruction completed per cycleBut much faster cycle time than unpipelined machineHowever - hazards are working against the idealHowever - hazards are working against the idealHazards resolved using forwarding are fineStalling degrades performance and instruction comletion rate is interruptedCPI is measure of “architectural efficiency” of designCPI is measure of “architectural efficiency” of designCycleSecondsnInstructioCyclesProgramnsInstructioProgramSeconds timeCPU 14CPI for PIPECPI CPI  1.0 1.0Fetch instruction each clock cycleEffectively process new instruction almost every cycleAlthough each individual instruction has latency of 5 cyclesCPI CPI >> 1.0 1.0Sometimes must stall or cancel branchesComputing CPIComputing CPIC clock cyclesI instructions executed to completionB bubbles injected (C = I + B)CPI = C/I = (I+B)/I = 1.0 + B/IFactor B/I represents average penalty due to bubbles15Computing CPICPICPIFunction of useful instruction and bubblesCb/Ci represents the pipeline penalty due to stallsCan reformulate to account forCan reformulate to account forload penalties (lp)branch misprediction penalties (mp)return penalties (rp)CPI Ci CbCi1.0 CbCiCPI 1.0  lp  mp  rp16Computing CPI - IISo how do we determine the penalties?So how do we determine the penalties?Depends on how often each situation occurs on


View Full Document

UT CS 429H - Pipelining IV

Download Pipelining IV
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pipelining IV and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pipelining IV 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?