Unformatted text preview:

Systems I Pipelining IV Topics Implementing pipeline control Pipelining and performance analysis Implementing Pipeline Control W icode valE valM dstE dstM valE valA dstE dstM M icode M icode Bch e Bch CC CC E dstM E icode Pipe control logic E bubble E icode ifun valC valA valB dstE dstM srcA d srcB d srcA srcB D icode D bubble srcB D stall D F stall F srcA icode ifun rA rB valC valP predPC Combinational logic generates pipeline control signals Action occurs at start of following cycle 2 Initial Version of Pipeline Control bool F stall Conditions for a load use hazard E icode in IMRMOVL IPOPL E dstM in d srcA d srcB Stalling at fetch while ret passes through pipeline IRET in D icode E icode M icode bool D stall Conditions for a load use hazard E icode in IMRMOVL IPOPL E dstM in d srcA d srcB bool D bubble Mispredicted branch E icode IJXX e Bch Bubble for ret IRET in D icode E icode M icode bool E bubble Mispredicted branch E icode IJXX e Bch Load use hazard E icode in IMRMOVL IPOPL E dstM in d srcA d srcB 3 Control Combinations Load use M E D Load Use Mispredict M E D ret 1 M JXX E ret D Combination A ret 2 M E D ret bubble ret 3 M E D ret bubble bubble Combination B Special cases that can arise on same clock cycle Combination A Not taken branch ret instruction at branch target Combination B Instruction that reads from memory to esp Followed by ret instruction 4 Control Combination A Mispredict M E D ret 1 M JXX E ret D Combination A Condition F D E M W stall bubble normal normal normal Mispredicted Branch normal bubble bubble normal normal Combination bubble bubble normal normal Processing ret stall Should handle as mispredicted branch Stalls F pipeline register But PC selection logic will be using M valM anyhow 5 Stall in F Your book provides two inconsistent meanings for stall in F Instruction remains in F and injects a bubble into D Instruction squashed into D same PC fetched Figure 4 61 F D E M W F D E M W Use the one that keeps 1 instr per pipeline stage 6 JXX ret works great 1 2 3 4 5 6 F D F E D M E W M W F D E M W D F E D F M E D 0x000 xorl eax eax 0x002 jne target Not taken 0x011 t ret Target bubble 0x012 nop Target 1 bubble 0x007 irmovl 1 eax Fall through 0x00d nop 7 8 9 10 W M E W M W F 7 Control Combination B ret 1 Load use M E D M E D Load Use ret Combination B Condition F D E M W Processing ret stall bubble normal normal normal Load Use Hazard stall stall bubble normal normal Combination stall bubble bubble stall normal normal Would attempt to bubble and stall pipeline register D Signaled by processor as pipeline error 8 Handling Control Combination B ret 1 Load use M E D M E D Load Use ret Combination B Condition F D E M W Processing ret stall bubble normal normal normal Load Use Hazard stall stall bubble normal normal Combination stall stall bubble normal normal Load use hazard should get priority ret instruction should be held in decode stage for additional cycle 9 Corrected Pipeline Control Logic bool D bubble Mispredicted branch E icode IJXX e Bch Stalling at fetch while ret passes IRET in D icode E icode M icode but not condition for a load use E icode in IMRMOVL IPOPL E dstM in d srcA d srcB Condition through pipeline hazard F D E M W Processing ret stall bubble normal normal normal Load Use Hazard stall stall bubble normal normal Combination stall stall bubble normal normal Load use hazard should get priority ret instruction should be held in decode stage for additional cycle 10 Load use hazard with ret mrmovl ret F D F mrmovl F D E ret F D addl F mrmovl F D E M bubble E ret F D D addl F F mrmovl F D E M W bubble E M ret F D D E addl F F bubble D addl F 11 Pipeline Summary Data Hazards Most handled by forwarding No performance penalty Load use hazard requires one cycle stall Control Hazards Cancel instructions when detect mispredicted branch Two clock cycles wasted Stall fetch stage while ret passes through pipeline Three clock cycles wasted Control Combinations Must analyze carefully First version had subtle bug Only arises with unusual instruction combination 12 Performance Analysis with Pipelining CPU time Seconds Instructions Cycles Seconds Program Program Instruction Cycle Ideal pipelined machine CPI 1 One instruction completed per cycle But much faster cycle time than unpipelined machine However hazards are working against the ideal Hazards resolved using forwarding are fine Stalling degrades performance and instruction comletion rate is interrupted CPI is measure of architectural efficiency of design 13 CPI for PIPE CPI 1 0 Fetch instruction each clock cycle Effectively process new instruction almost every cycle Although each individual instruction has latency of 5 cycles CPI 1 0 Sometimes must stall or cancel branches Computing CPI C clock cycles I instructions executed to completion B bubbles injected C I B CPI C I I B I 1 0 B I Factor B I represents average penalty due to bubbles 14 Computing CPI CPI Function of useful instruction and bubbles Ci C b Cb CPI 1 0 Ci Ci Cb Ci represents the pipeline penalty due to stalls Can reformulate to account for load penalties lp branch misprediction penalties mp return penalties rp CPI 1 0 lp mp rp 15 Computing CPI II So how do we determine the penalties Depends on how often each situation occurs on average How often does a load occur and how often does that load cause a stall How often does a branch occur and how often is it mispredicted How often does a return occur We can measure these simulator hardware performance counters We can estimate through historical averages Then use to make early design tradeoffs for architecture 16 Computing CPI III Cause Name Instruction Condition Frequency Frequency Stalls Product Load Use lp 0 30 0 3 1 0 09 Mispredict mp 0 20 0 4 2 0 16 Return rp 0 02 1 0 3 0 06 Total penalty 0 31 CPI 1 0 31 1 31 31 worse than ideal This gets worse when Account for non ideal memory access latency Deeper pipelines where stalls per hazard increase 17 CPI for PIPE Cont B I LP MP RP LP Penalty due to load use hazard stalling Fraction of instructions that are loads Fraction of load instructions requiring stall Number of bubbles injected each time LP 0 25 0 20 1 0 05 MP Penalty due to mispredicted branches Fraction of instructions that are cond jumps Fraction of cond jumps mispredicted Number of bubbles injected each time MP 0 20 0 40 2 …


View Full Document

UT CS 429H - Pipelining IV

Loading Unlocking...
Login

Join to view Pipelining IV and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pipelining IV and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?