CS152 Computer Architecture and Engineering Lecture 13 Static Pipeline Scheduling Compiler Optimizations March 15 2004 John Kubiatowicz http cs berkeley edu kubitron lecture slides http www inst eecs berkeley edu cs152 Recall Data Hazard Solution Forwarding Forward result from one stage to another Dm Reg Dm Im Reg Dm Im Reg Dm Im Reg ALU or r8 r1 r9 W B Reg ALU and r6 r1 r7 Im MEM ALU sub r4 r1 r3 EX ALU O r d e r add r1 r2 r3 ID R FReg ALU I n s t r Time clock cycles I F Im xor r10 r1 r11 Reg Reg Reg Dm Reg or OK if define read write properly 3 15 04 UCB Spring 2004 CS152 Kubiatowicz Recall Resolve RAW by forwarding or bypassing IAU npc I mem Regs op rw rs rt Forward mux B A im PC n op rw alu S n op rw Detect nearest valid write op operand register and forward into op latches bypassing remainder of the pipe Increase muxes to add paths from pipeline registers Data Forwarding Data Bypassing D mem m n op rw Regs 3 15 04 UCB Spring 2004 CS152 Kubiatowicz FYI MIPS R3000 clocking discipline phi1 phi2 2 phase non overlapping clocks Pipeline stage is two level sensitive latches phi1 phi2 phi1 Edge triggered 3 15 04 UCB Spring 2004 CS152 Kubiatowicz MIPS R3000 Instruction Pipeline Decode Reg Read Inst Fetch TLB I Cache RF ALU E A Memory Operation E A TLB Write Reg WB D Cache Resource Usage TLB TLB I cache RF WB ALUALU D Cache Write in phase 1 read in phase 2 eliminates bypass from WB 3 15 04 UCB Spring 2004 CS152 Kubiatowicz Thus only 2 levels of forwarding Dm Reg Dm Im Reg Dm Im Reg Dm Im Reg ALU or r8 r1 r9 W B Reg ALU and r6 r1 r7 Im MEM ALU sub r4 r1 r3 EX ALU O r d e r add r1 r2 r3 ID R FReg ALU I n s t r Time clock cycles I F Im xor r10 r1 r11 Reg Reg Reg Dm Reg With MIPS R3000 pipeline no need to forward from WB stage 3 15 04 UCB Spring 2004 CS152 Kubiatowicz Recall Examples of stalls bubbles Exceptions Flush everything above Prevent instructions following exception from commiting state Freeze fetch until exception resolved Stalls Introduce brief stalls into pipeline Decode stage recognizes that current instruction cannot proceed Freeze fetch stage Introduce bubble into EX stage instead of forwarding stalled inst Can stall until condition is resolved Examples mfhi mflo need to wait for multiply divide unit to finish 3 15 04 Break instruction for Lab5 stall until release line received Load delay slot handled this way as well UCB Spring 2004 CS152 Kubiatowicz Recall Freeze above Bubble Below IAU npc I mem Regs op rw rs rt freeze PC bubble B A im n op rw alu S n op rw Flush accomplished by setting invalid bit in pipeline D mem m n op rw Regs 3 15 04 UCB Spring 2004 CS152 Kubiatowicz Recall Achieving Precise Exceptions Time Bad Inst Inst TLB fault Overflow IFetch Dcd Program Flow Data TLB Exec IFetch Dcd Mem WB Exec Mem WB Exec Mem WB Exec Mem IFetch Dcd IFetch Dcd WB Use pipeline to sort this out Pass exception status along with instruction Keep track of PCs for every instruction in pipeline Don t act on exception until it reach WB stage Handle interrupts through faulting noop in IF stage When instruction reaches end of MEM stage Save PC EPC Interrupt vector addr PC Turn all instructions in earlier stages into noops 3 15 04 UCB Spring 2004 CS152 Kubiatowicz Recall What about memory operations If instructions are initiated in order and operations always occur in the same stage there can be no hazards between memory operations What about data dependence on loads R1 R4 R5 R2 Mem R2 I R3 R2 R1 Delayed Loads Can recognize this in decode stage and introduce bubble while stalling fetch stage hint for lab 4 Tricky situation R1 Mem R2 I Mem R3 34 R1 Handle with bypass in memory stage 3 15 04 UCB Spring 2004 op Rd Ra Rb op Rd Ra Rb Rd Rd A D B R Mem T to reg file CS152 Kubiatowicz MIPS R3000 Multicycle Operations Use control word of local stage to step through multicycle operation op Rd Ra Rb Stall all stages above multicycle operation in the pipeline mul Rd Ra Rb Rd A B Drain bubble stages below it Alternatively launch multiply divide to autonomous unit only stall pipe if attempt to get result before ready R Rd This means stall mflo mfhi in decode stage if multiply divide still executing Extra credit in Lab 5 does this T to reg file Ex Multiply Divide Cache Miss 3 15 04 UCB Spring 2004 CS152 Kubiatowicz Case Study MIPS R4000 200 MHz 8 Stage Pipeline IF first half of fetching of instruction PC selection happens here as well as initiation of instruction cache access IS second half of access to instruction cache RF instruction decode and register fetch hazard checking and also instruction cache hit detection EX execution which includes effective address calculation ALU operation and branch target computation and condition evaluation DF data fetch first half of access to data cache DS second half of access to data cache TC tag check determine whether the data cache access hit WB write back for loads and register register operations 8 Stages What is impact on Load delay Branch delay Why 3 15 04 UCB Spring 2004 CS152 Kubiatowicz Case Study MIPS R4000 IF IS IF RF IS IF EX RF IS IF DF EX RF IS IF DS DF EX RF IS IF TC DS DF EX RF IS IF WB TC DS DF EX RF IS IF IF THREE Cycle Branch Latency conditions evaluated during EX phase IS IF RF IS IF EX RF IS IF DF EX RF IS IF DS DF EX RF IS IF TC DS DF EX RF IS IF WB TC DS DF EX RF IS IF TWO Cycle Load Latency Delay slot plus two stalls Branch likely cancels delay slot if not taken 3 15 04 UCB Spring 2004 CS152 Kubiatowicz MIPS R4000 Floating Point FP Adder FP Multiplier FP Divider Last step of FP Multiplier Divider uses FP Adder HW 8 kinds of stages in FP units Stage Functional unit Description A FP adder Mantissa ADD stage D FP divider Divide pipeline stage E FP multiplier Exception test stage M FP multiplier First stage of multiplier N FP multiplier Second stage of multiplier R FP adder Rounding stage S FP adder Operand shift stage U 3 15 04 Unpack FP numbers UCB Spring 2004 CS152 Kubiatowicz MIPS FP Pipe Stages FP Instr 1 2 3 4 5 6 7 Add Subtract U S A A R R S Multiply U E M M Divide U A R Square root U E Negate U S Absolute value U S FP compare U A 8 M M N N A R D28 D A D …
View Full Document
Unlocking...