Systems I Pipelining III Topics Hazard mitigation through pipeline forwarding Hardware support for forwarding Forwarding to mitigate control branch hazards How do we fix the Pipeline Pad the program with NOPs Yuck Stall the pipeline Data hazards Wait for producing instruction to complete Then proceed with consuming instruction Control hazards Wait until new PC has been determined Then begin fetching How is this better than putting NOPs into the program Forward data within the pipeline Grab the result from somewhere in the pipe After it has been computed But before it has been written back This gives an opportunity to avoid performance degradation due to hazards 2 Data Forwarding Na ve Pipeline Register isn t written until completion of write back stage Source operands read from register file in decode stage Needs to be in register file at start of stage Observation Value generated in execute or memory stage Trick Pass value directly from generating instruction to decode stage Needs to be available at end of decode stage 3 Data Forwarding Example demo h2 ys 1 2 3 4 5 0x000 irmovl 10 edx F D F E D F M E D F W M E D F 0x006 irmovl 3 eax 0x00c nop 0x00d nop 0x00e addl edx eax 0x010 halt irmovl in writeback stage Destination value in W pipeline register Forward as valB for decode stage 6 7 8 9 10 W M E D F W M E D W M E W M W Cycle 6 W R eax 3 W dstE eax W valE 3 D srcA edx srcB eax valA R edx 10 valB W valE 3 4 Bypass Paths W icode W valM W valE W valM W dstE W dstM W valE W valM W Decode Stage m valM Forwarding logic Memory selects valA and valB Normally from register file Addr Data M valE M Forwarding get valA or valB from later pipeline Execute stage Execute valE Memory valE valM Write back valE valM e valE Bch CC CC ALU ALU E valA E valB E srcA E srcB Forwarding Sources Data Data memory memory M icode M Bch M valA E valA valB Forward d srcA d srcB Decode A B Register Register M file file E Write back D icode ifun rA rB valC valP 5 valP Data Forwarding Example 2 demo h0 ys 1 2 3 4 5 6 7 0x000 irmovl 10 edx F D F E D F M E D W M E W M W F D E M 0x006 irmovl 3 eax 0x00c addl edx eax 0x00e halt Register edx Generated by ALU during previous cycle Forward from memory as valA Register eax Value just generated by ALU Forward from execute as valB 8 W Cycle 4 M M dstE edx M valE 10 E E dstE eax e valE 0 3 3 D srcA edx srcB eax valA M valE 10 valB e valE 3 6 Implementing Forwarding W valE Write back W valM W icode valE valM dstE dstM data out read Mem control Data Data memory memory write Memory m valM Add additional feedback paths from E M and W pipeline registers into decode stage Create logic blocks to select from multiple sources for valA and valB in decode stage data in Addr M Bch M icode M valA M valE Bch valE valA dstE dstM e Bch e valE ALU ALU CC CC Execute E icode ifun ALU fun ALU A ALU B valC valA valB dstE dstM srcA srcB d srcA d srcB dstE dstM srcA srcB Sel Fwd A Decode D Fetch icode ifun Fwd B A W valM B Register Register M file file E rA rB Instruction Instruction memory memory valC W valE valP PC PC increment increment Predict PC 7 f PC M valA Select Implementing Forwarding W valE W valM valE valM dstE dstM data out read Mem control m valM Data Data memory memory write data in Addr M valA M valE Bch valE valA dstE dstM e Bch e valE ALU ALU CC CC ALU fun ALU A ALU B valC valA valB dstE dstM srcA srcB d srcA d srcB dstE dstM srcA srcB Sel Fwd A Fwd B A W valM B Register Register M file file E rA rB Instruction Instruction memory valC W valE 8 valP PC PC increment What should be the A value int new E valA Use incremented PC D icode in ICALL IJXX D valP Forward valE from execute d srcA E dstE e valE Forward valM from memory d srcA M dstM m valM Forward valE from memory d srcA M dstE M valE Forward valM from write back d srcA W dstM W valM Forward valE from write back d srcA W dstE W valE Use value read from register file 1 d rvalA Predict PC Limitation of Forwarding demo luh ys 1 2 3 4 5 0x000 irmovl 128 edx F D E M W 0x006 irmovl 3 ecx 0x00c rmmovl ecx 0 edx F D F E D irmovl 10 ebx F mrmovl 0 edx eax Load eax addl ebx eax Use eax halt 0x012 0x018 0x01e 0x020 Load use dependency Value needed by end of decode stage in cycle 7 Value read from memory in memory stage of cycle 8 6 7 M E W M W D F E D F M E D F 8 9 10 11 W M E D W M E W M W Cycle 7 Cycle 8 M M M dstE ebx M valE 10 M dstM eax m valM M 128 3 D valA M valE 10 valB R eax 0 Error 9 Avoiding Load Use Hazard demo luh ys 1 2 3 4 5 6 0x000 irmovl 128 edx 0x006 irmovl 3 ecx F D F E D M E W M W 0x00c rmmovl ecx 0 edx F D 0x012 irmovl 10 ebx F 0x018 mrmovl 0 edx eax Load eax bubble E D F 0x01e addl ebx eax Use eax 0x020 halt Stall using instruction for one cycle Can then pick up loaded value by forwarding from memory stage 7 8 M E W M W D E F D F 9 10 M E W M W D F E D M E 11 W M 12 W Cycle 8 W W dstE ebx W valE 10 M M dstM eax m valM M 128 3 D valA W valE 10 valB m valM 3 10 data out read Mem control Data Data memory memory write Memory m valM Detecting Load Use Hazard data in Addr M Bch M icode M valA M valE Bch valE valA dstE dstM e Bch e valE ALU ALU CC CC Execute E icode ifun ALU fun ALU A ALU B valC valA valB dstE dstM srcA srcB d srcA d srcB dstE Sel Fwd A Decode D icode Condition Fetch Load Use Hazard F rA rB Instruction Instruction memory memory valC Trigger srcA srcB Fwd B A ifun dstM W valM B Register RegisterM file file E W valE valP PC PC increment increment Predict PC f PC M …
View Full Document
Unlocking...