Unformatted text preview:

Systems I Pipelining IV Topics Implementing pipeline control Pipelining and performance analysis Implementing Pipeline Control W icode valE valM dstE dstM valE valA dstE dstM M icode M icode Bch e Bch CC CC E dstM E icode Pipe control logic E bubble E icode ifun valC valA valB dstE dstM srcA srcB d srcB d srcA srcB D icode srcA D bubble D stall D F stall F icode ifun rA rB valC valP predPC Combinational logic generates pipeline control signals Action occurs at start of following cycle 2 Initial Version of Pipeline Control bool F stall Conditions for a load use hazard E icode in IMRMOVL IPOPL E dstM in d srcA d srcB Stalling at fetch while ret passes through pipeline IRET in D icode E icode M icode bool D stall Conditions for a load use hazard E icode in IMRMOVL IPOPL E dstM in d srcA d srcB bool D bubble Mispredicted branch E icode IJXX e Bch Stalling at fetch while ret passes through pipeline IRET in D icode E icode M icode bool E bubble Mispredicted branch E icode IJXX e Bch Load use hazard E icode in IMRMOVL IPOPL E dstM in d srcA d srcB 3 Control Combinations Load use M E D M Load Use ret 1 Mispredict E D ret 2 M M E ret D Combination A E D JXX ret bubble ret 3 M ret E D bubble bubble Combination B Special cases that can arise on same clock cycle Combination A Not taken branch ret instruction at branch target Combination B Instruction that reads from memory to esp Followed by ret instruction 4 E icode ifun valC valA valB dstE dstM srcA dstE dstM srcA srcB d srcA d srcB Select A Decode A Control Combination A D Mispredict M E D ifun rA Fetch srcB W valM B Register RegisterM file file W valE E rB valC valP ret 1 M JXX E ret D Combination A Condition icode d rvalA Instruction Instruction memory memory PC PC increment increment Predict PC f PC M valA Select PC F W valM predPC F D E M W stall bubble normal normal normal Mispredicted Branch normal bubble bubble normal normal Combination bubble bubble normal normal Processing ret stall Should handle as mispredicted branch Stalls F pipeline register But PC selection logic will be using M valM anyhow 5 Control Combination B ret 1 Load use M E D M E D Load Use ret Combination B Condition F D E M W Processing ret stall bubble normal normal normal Load Use Hazard stall stall bubble normal normal Combination stall bubble bubble stall normal normal Would attempt to bubble and stall pipeline register D Signaled by processor as pipeline error 6 Handling Control Combination B ret 1 Load use M E D M E D Load Use ret Combination B Condition F D E M W Processing ret stall bubble normal normal normal Load Use Hazard stall stall bubble normal normal Combination stall stall bubble normal normal Load use hazard should get priority ret instruction should be held in decode stage for additional cycle 7 Corrected Pipeline Control Logic bool D bubble Mispredicted branch E icode IJXX e Bch Stalling at fetch while ret passes IRET in D icode E icode M icode but not condition for a load use E icode in IMRMOVL IPOPL E dstM in d srcA d srcB Condition through pipeline hazard F D E M W Processing ret stall bubble normal normal normal Load Use Hazard stall stall bubble normal normal Combination stall stall bubble normal normal Load use hazard should get priority ret instruction should be held in decode stage for additional cycle 8 Pipeline Summary Data Hazards Most handled by forwarding No performance penalty Load use hazard requires one cycle stall Control Hazards Cancel instructions when detect mispredicted branch Two clock cycles wasted Stall fetch stage while ret passes through pipeline Three clock cycles wasted Control Combinations Must analyze carefully First version had subtle bug Only arises with unusual instruction combination 9 Performance Analysis with Pipelining Seconds Instructions Cycles Seconds CPU time Program Program Instruction Cycle Ideal pipelined machine CPI 1 One instruction completed per cycle But much faster cycle time than unpipelined machine However hazards are working against the ideal Hazards resolved using forwarding are fine Stalling degrades performance and instruction comletion rate is interrupted CPI is measure of architectural efficiency of design 10 Computing CPI CPI Function of useful instruction and bubbles CPI Ci Cb C 1 0 b Ci Ci Cb Ci represents the pipeline penalty due to stalls Can reformulate to account for load penalties lp branch misprediction penalties mp return penalties rp CPI 1 0 lp mp rp 11 Computing CPI II So how do we determine the penalties Depends on how often each situation occurs on average How often does a load occur and how often does that load cause a stall How often does a branch occur and how often is it mispredicted How often does a return occur We can measure these simulator hardware performance counters We can estimate through historical averages Then use to make early design tradeoffs for architecture 12 Computing CPI III Cause Name Instruction Condition Frequency Frequency Stalls Product Load Use lp 0 30 0 30 0 3 1 0 09 Mispredict mp 0 20 0 4 2 0 16 Return rp 0 02 1 0 3 0 06 Total penalty 0 31 CPI 1 0 31 1 31 31 worse than ideal This gets worse when Account for non ideal memory access latency Deeper pipelines where stalls per hazard increase 13 Summary Today Pipeline control logic Effect on CPI and performance Next Time Further mitigation of branch mispredictions State machine design 14


View Full Document

UT CS 429H - Pipelining IV

Loading Unlocking...
Login

Join to view Pipelining IV and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pipelining IV and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?