DOC PREVIEW
UT CS 429H - Pipelining IV

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Pipelining IVTopicsTopics Implementing pipeline control Pipelining and performance analysisSystems I2Implementing Pipeline Control Combinational logic generates pipeline control signals Action occurs at start of following cycleEMWFDCCCCrBsrcAsrcBicode valE valM dstE dstMBchicode valE valA dstE dstMicode ifun valC valA valB dstE dstM srcA srcBvalC valPicode ifun rApredPCd_srcBd_srcAe_BchD_icodeE_icodeM_icodeE_dstMPipecontrollogicD_bubbleD_stallE_bubbleF_stall3Initial Version of Pipeline Controlbool F_stall =# Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } ||# Stalling at fetch while ret passes through pipelineIRET in { D_icode, E_icode, M_icode };bool D_stall =# Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode };bool E_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};4Control Combinations Special cases that can arise on same clock cycleCombination ACombination A Not-taken branch ret instruction at branch targetCombination BCombination B Instruction that reads from memory to %esp Followed by ret instructionLoadEUseDMLoad/useJXXEDMMispredictJXXEDMMispredictEretDMret 1retEbubbleDMret 2bubbleEbubbleDretMret 3EretDMret 1EretDMret 1retEbubbleDMret 2retEbubbleDMret 2bubbleEbubbleDretMret 3bubbleEbubbleDretMret 3Combination BCombination A5Control Combination A Should handle as mispredicted branch Stalls F pipeline register But PC selection logic will be using M_valM anyhowJXXEDMMispredictJXXEDMMispredictEretDMret1EretDMret1EretDMret1Combination AnormalnormalnormalnormalbubblebubblebubblebubblestallstallCombinationCombinationnormalnormalstallstallFFbubblebubblebubblebubbleDDbubblebubblenormalnormalEEnormalnormalnormalnormalMMnormalnormalnormalnormalWWMispredicted Mispredicted BranchBranchProcessing retProcessing retConditionConditionEMWFDInstructionmemoryInstructionmemoryPCincrementPCincrementRegisterfileRegisterfileALUALUDatamemoryDatamemorySelectPCrBdstE dstMSelectAALUAALUBMem.controlAddrsrcA srcBreadwriteALUfun.FetchDecodeExecuteMemoryWrite backicodedata outdata inA BMEM_valAW_valMW_valEM_valAW_valMd_rvalAf_PCPredictPCvalE valM dstE dstMBchicode valE valA dstE dstMicode ifun valC valA valB dstE dstM srcA srcBvalC valPicode ifun rApredPCCCCCd_srcBd_srcAe_BchM_BchCCCCd_srcBd_srcAe_BchM_Bch6Control Combination B Would attempt to bubble and stall pipeline register D Signaled by processor as pipeline errorLoadEUseDMLoad/useEretDMret1EretDMret1EretDMret1Combination BstallstallstallstallstallstallFFbubble +bubble +stallstallstallstallbubblebubbleDDbubblebubblebubblebubblenormalnormalEEnormalnormalnormalnormalnormalnormalMMnormalnormalnormalnormalnormalnormalWWCombinationCombinationLoad/Use HazardLoad/Use HazardProcessing retProcessing retConditionCondition7Handling Control Combination B Load/use hazard should get priority ret instruction should be held in decode stage for additionalcycleLoadEUseDMLoad/useEretDMret1EretDMret1EretDMret1Combination BstallstallstallstallstallstallFFstallstallstallstallbubblebubbleDDbubblebubblebubblebubblenormalnormalEEnormalnormalnormalnormalnormalnormalMMnormalnormalnormalnormalnormalnormalWWCombinationCombinationLoad/Use HazardLoad/Use HazardProcessing retProcessing retConditionCondition8Corrected Pipeline Control Logic Load/use hazard should get priority ret instruction should be held in decode stage for additionalcyclestallstallstallstallstallstallFFstallstallstallstallbubblebubbleDDbubblebubblebubblebubblenormalnormalEEnormalnormalnormalnormalnormalnormalMMnormalnormalnormalnormalnormalnormalWWCombinationCombinationLoad/Use HazardLoad/Use HazardProcessing retProcessing retConditionConditionbool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB });9Pipeline SummaryData HazardsData Hazards Most handled by forwarding No performance penalty Load/use hazard requires one cycle stallControl HazardsControl Hazards Cancel instructions when detect mispredicted branch Two clock cycles wasted Stall fetch stage while ret passes through pipeline Three clock cycles wastedControl CombinationsControl Combinations Must analyze carefully First version had subtle bug Only arises with unusual instruction combination10Performance Analysis with PipeliningIdeal pipelined machine: CPI = 1Ideal pipelined machine: CPI = 1 One instruction completed per cycle But much faster cycle time than unpipelined machineHowever - hazards are working against the idealHowever - hazards are working against the ideal Hazards resolved using forwarding are fine Stalling degrades performance and instruction comletionrate is interruptedCPI is measure of CPI is measure of ““architectural efficiencyarchitectural efficiency”” of design of designCycleSecondsnInstructioCyclesProgramnsInstructioProgramSeconds timeCPU !!==11Computing CPICPICPI Function of useful instruction and bubbles Cb/Ci represents the pipeline penalty due to stallsCan reformulate to account forCan reformulate to account for load penalties (lp) branch misprediction penalties (mp) return penalties (rp)! CPI =Ci+ CbCi= 1.0 +CbCi! CPI = 1.0 + lp + mp + rp12Computing CPI - IISo how do we determine the penalties?So how do we determine the penalties? Depends on how often each situation occurs on average How often does a load occur and how often does that loadcause a stall? How often does a branch occur and how often is itmispredicted How often does a return occur?We can measure theseWe can measure these simulator hardware performance countersWe can estimate throughWe can estimate through historical averageshistorical averages Then use to make early design tradeoffs for architecture13Computing CPI - IIICPICPI = 1 + 0.31 = 1.31 == 31% worse than ideal= 1 + 0.31 = 1.31 == 31% worse than idealThis gets worse when:This gets worse when: Account for non-ideal memory access latency Deeper pipelines (where stalls per hazard increase)0.310.31Total


View Full Document

UT CS 429H - Pipelining IV

Download Pipelining IV
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pipelining IV and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pipelining IV 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?