CS 152 Computer Architecture and Engineering Lecture 5 Pipelining II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http www eecs berkeley edu krste http inst eecs berkeley edu cs152 February 2 2010 CS152 Spring 2010 Last time in Lecture 4 Pipelining increases clock frequency while growing CPI more slowly hence giving greater performance Time Instructions Cycles Time Program Program Instruction Cycle Increases because of pipeline bubbles Reduces because fewer logic gates on critical paths between flip flops Pipelining of instructions is complicated by HAZARDS Structural hazards two instructions want same hardware resource Data hazards earlier instruction produces value needed by later instruction Control hazards instruction changes control flow e g branches or exceptions Techniques to handle hazards Interlock hold newer instruction until older instructions drain out of pipeline and write back results Bypass transfer value from older instruction to newer instruction as soon as available somewhere in machine Speculate guess effect of earlier instruction February 2 2010 CS152 Spring 2010 2 Control Hazards What do we need to calculate next PC For Jumps Opcode offset and PC For Jump Register Opcode and Register value For Conditional Branches Opcode PC Register for condition and offset For all other instructions Opcode and PC have to know it s not one of above February 2 2010 CS152 Spring 2010 3 PC Calculation Bubbles assuming no branch delay slots for now time t0 t1 t2 t3 t4 t5 t6 t7 I1 r1 r0 10 IF1 ID1 EX1 MA1 WB1 I2 r3 r2 17 IF2 IF2 ID2 EX2 MA2 WB2 I3 IF3 IF3 ID3 EX3 MA3 WB3 I4 IF4 IF4 ID4 EX4 MA4 WB4 Resource Usage time t0 t1 t2 t3 t4 t5 t6 t7 IF I1 nop I2 nop I3 nop I4 ID I1 nop I2 nop I3 nop I4 EX I1 nop I2 nop I3 nop I4 MA I1 nop I2 nop I3 nop I4 WB I1 nop I2 nop I3 nop I4 nop February 2 2010 CS152 Spring 2010 pipeline bubble 4 Speculate next address is PC 4 PCSrc pc 4 jabs rind br stall Add nop 0x4 Add Jump PC 104 I1 I2 I3 I4 E M IR IR I1 addr inst Inst Memory 096ADD 100J 304 104ADD 304ADD February 2 2010 IR I2 kill A jump instruction kills not stalls the following instruction CS152 Spring 2010 How 5 Pipelining Jumps PCSrc pc 4 jabs rind br stall To kill a fetched instruction Insert a mux before IR Add nop 0x4 Add Jump 304 104 I1 I2 I3 I4 addr inst Inst Memory 096ADD 100J 304 104ADD 304ADD February 2 2010 nop IR nop I2 kill M IR IR II21 I1 Any interaction between stall and jump IRSrcD PC E IRSrcD Case opcodeD J JAL nop IM CS152 Spring 2010 6 Jump Pipeline Diagrams I1 I2 I3 I4 Resource Usage time t0 t1 t2 t3 t4 t5 t6 t7 096 ADD IF1 ID1 EX1 MA1 WB1 100 J 304 IF2 ID2 EX2 MA2 WB2 104 ADD IF3 nop nop nop 304 ADD IF4 ID4 EX4 MA4 WB4 time t0 t1 t2 t3 IF I1 I2 I3 ID I1 I2 EX I1 MA WB t4 t5 t6 t7 I4 I5 nop I4 I5 I2 nop I4 I5 I1 I2 nop I4 I5 I1 I2 nop I4 nop February 2 2010 CS152 Spring 2010 nop I5 pipeline bubble 7 Pipelining Conditional Branches PCSrc pc 4 jabs rind br stall Add nop 0x4 Add E M IR IR I1 BEQZ zero IRSrcD PC 104 I1 I2 I3 I4 addr inst nop Inst Memory 096 ADD 100 BEQZ r1 200 104 ADD 304 ADD February 2 2010 A IR ALU Y I2 Branch condition is not known until the execute stage what action should be taken in the decode stage CS152 Spring 2010 8 Pipelining Conditional Branches PCSrc pc 4 jabs rind br stall Add E nop 0x4 Add M BEQZ IR IR I2 I1 zero IRSrcD PC 108 I1 I2 I3 I4 addr inst nop Inst Memory 096 ADD 100 BEQZ r1 200 104 ADD 304 ADD February 2 2010 IR A ALU Y I3 If the branch is taken kill the two following instructions the instruction at the decode stage is not valid stall signal is not valid CS152 Spring 2010 9 Pipelining Conditional Branches stall Add PCSrc pc 4 jabs rind br E IRSrcE nop 0x4 Add Jump M BEQZ IR IR I2 I1 zero PC PC 108 I1 I2 I3 I4 IRSrcD addr inst nop Inst Memory 096 ADD 100 BEQZ r1 200 104 ADD 304 ADD February 2 2010 IR A ALU Y I3 If the branch is taken kill the two following instructions the instruction at the decode stage is not valid stall signal is not valid CS152 Spring 2010 10 New Stall Signal stall rsD wsE weE rsD wsM weM rsD wsW weW re1D rtD wsE weE rtD wsM weM rtD wsW weW re2D opcodeE BEQZ z opcodeE BNEZ z Don t stall if the branch is taken Why Instruction at the decode stage is invalid February 2 2010 CS152 Spring 2010 11 Control Equations for PC and IR Muxes PCSrc Case opcodeE BEQZ z BNEZ z br Case opcodeD J JAL jabs JR JALR rind pc 4 IRSrcD Case opcodeE BEQZ z BNEZ z nop Case opcodeD J JAL JR JALR nop IM Give priority to the older instruction i e execute stage instruction over decode stage instruction IRSrcE Case opcodeE BEQZ z BNEZ z nop stall nop stall IRD February 2 2010 CS152 Spring 2010 12 Branch Pipeline Diagrams resolved in execute stage I1 I2 I3 I4 I5 Resource Usage time t0 t1 t2 t3 t4 t5 t6 t7 096 ADD IF1 ID1 EX1 MA1 WB1 100 BEQZ 200 IF2 ID2 EX2 MA2 WB2 104 ADD IF3 ID3 nop nop nop 108 IF4 nop nop nop 304 ADD IF5 ID5 EX5 MA5 WB5 time t0 t1 t2 t3 IF I1 I2 I3 ID I1 I2 EX I1 MA WB t4 I4 I3 I2 I1 t5 t6 t7 I5 nop I5 nop nop I5 I2 nop nop I5 I1 I2 nop nop nop February 2 2010 CS152 Spring 2010 nop I5 pipeline bubble 13 Reducing Branch Penalty resolve in decode stage One pipeline bubble can be removed if an extra comparator is used in the Decode stage PCSrc pc 4 jabs rind br E Add nop 0x4 IR Add PC nop addr inst Inst Memory IR D we rs1 rs2 rd1 ws wd rd2 Zero detect on register file output GPRs Pipeline diagram now same as for jumps February 2 2010 CS152 Spring 2010 14 Branch Delay Slots expose control hazard to software Change the ISA semantics so that the instruction that follows a jump or branch is always executed gives compiler the flexibility to put in a useful instruction where normally a pipeline bubble would have resulted I1 I2 I3 I4 096ADD 100BEQZ r1 200 104ADD 304ADD Delay slot instruction executed regardless of branch outcome Other techniques include more advanced branch prediction which can dramatically reduce the branch penalty to come later February 2 2010 CS152 Spring 2010 15 Branch Pipeline Diagrams branch delay slot I1 I2 I3 I4 Resource Usage …
View Full Document
Unlocking...