CS152 Computer Architecture and Engineering Lecture 17 Branch Prediction Explicit Renaming ILP April 5 2004 John Kubiatowicz http cs berkeley edu kubitron lecture slides http inst eecs berkeley edu cs152 Review Tomasulo Organization FP Registers From Mem FP Op Queue Load Buffers Load1 Load2 Load3 Load4 Load5 Load6 Store Buffers Add1 Add2 Add3 Mult1 Mult2 Reservation Stations FP FP FPadders adders FPmultipliers multipliers Common Data Bus CDB 4 05 04 UCB Spring 2004 To Mem CS152 Kubiatowicz Review Three Stages of Tomasulo Algorithm 1 Issue get instruction from FP Op Queue If reservation station free no structural hazard control issues instr sends operands renames registers 2 Execution operate on operands EX When both operands ready then execute if not ready watch Common Data Bus for result 3 Write result finish execution WB Write on Common Data Bus to all awaiting units mark reservation station available Normal data bus data destination go to bus Common data bus data source come from bus 64 bits of data 4 bits of Functional Unit source address Write if matches expected Functional Unit produces result Does the broadcast 4 05 04 UCB Spring 2004 CS152 Kubiatowicz Review Tomasulo Architecture Reservations stations renaming to larger set of registers buffering source operands Prevents registers as bottleneck Avoids WAR WAW hazards of Scoreboard Not limited to basic blocks integer units gets ahead beyond branches Dynamic Scheduling Scoreboarding Tomasulo In order issue out of order execution out of order commit Tomasulo can unroll loops dynamically in hardware Need renaming different physical names for different iterations Fast branch computation 4 05 04 UCB Spring 2004 CS152 Kubiatowicz Review Tomasulo With Reorder buffer ROB Done FP Op Queue val2 val2 F0 F0 val2 val2 F4 F4 M 10 M 10 Reorder Buffer F2 F2 ROB F10 F10 F0 F0 ST ST 0 R3 F0 0 R3 F0 ADDD ADDD F0 F4 F6 F0 F4 F6 LD LD F4 0 R3 F4 0 R3 BNE BNE F2 F2 DIVD DIVD F2 F10 F6 F2 F10 F6 ADDD ADDD F10 F4 F0 F10 F4 F0 LD LD F0 10 R2 F0 10 R2 Registers Dest 22 ADDD ADDD R F4 ROB1 R F4 ROB1 To Memory Dest 33 DIVD DIVD ROB2 R F6 ROB2 R F6 Reservation Stations FP FP FPadders adders FPmultipliers multipliers 4 05 04 YY ROB7 Newest Ex Ex ROB6 YY ROB5 NN ROB5 NN ROB3 NN ROB2 Oldest NN ROB1 UCB Spring 2004 from Memory Dest 11 10 R2 10 R2 CS152 Kubiatowicz Review Four Steps of Speculative Tomasulo Algorithm 1 Issue get instruction from FP Op Queue If reservation station and reorder buffer slot free issue instr send operands reorder buffer no for destination this stage sometimes called dispatch 2 Execution operate on operands EX When both operands ready then execute if not ready watch CDB for result when both in reservation station execute checks RAW sometimes called issue 3 Write result finish execution WB Write on Common Data Bus to all awaiting FUs reorder buffer 4 Commit update register with reorder buffer ROB result When instr at head of reorder buffer result present update register with result or store to memory and remove instr from reorder buffer Stores only commit to memory when reach head of ROB Values only overwrite registers when they reach head Mispredicted branch or interrupt flushes reorder buffer NOTES In order issue Out of order execution In order commit Can always throw out contents of reorder buffer must cancel running ops Precise exception point is instruction at head of buffer 4 05 04 UCB Spring 2004 CS152 Kubiatowicz Tomasulo With Reorder buffer Memory Disambiguation Done FP Op Queue M 10 M 10 F0 F0 F4 F4 M 10 M 10 Reorder Buffer F2 F2 F10 F10 What about memory F0 F0 hazards ST ST 0 R3 F4 0 R3 F4 ADDD ADDD F0 F4 F6 F0 F4 F6 LD LD F4 0 R3 F4 0 R3 BNE BNE F2 F2 DIVD DIVD F2 F10 F6 F2 F10 F6 ADDD ADDD F10 F4 F0 F10 F4 F0 LD LD F0 10 R2 F0 10 R2 Registers Dest 22 ADDD ADDD R F4 ROB1 R F4 ROB1 To Memory Dest 33 DIVD DIVD ROB2 R F6 ROB2 R F6 Reservation Stations FP FP FPadders adders FPmultipliers multipliers 4 05 04 YY ROB7 Newest Ex Ex ROB6 YY ROB5 NN ROB4 NN ROB3 NN ROB2 Oldest NN ROB1 UCB Spring 2004 from Memory Dest 11 10 R2 10 R2 CS152 Kubiatowicz Memory Disambiguation Handling RAW Hazards in memory Question Given a load that follows a store in program order are the two related Alternatively is there a RAW hazard between the store and the load Eg st ld 0 R2 R5 R6 0 R3 Can we go ahead and start the load early Store address could be delayed for a long time by some calculation that leads to R2 divide We might want to issue begin execution of both operations in same cycle Two techiques No Speculation we are not allowed to start load until we know for sure that address 0 R2 0 R3 Speculation We might guess at whether or not they are dependent called dependence speculation and use reorder buffer to fixup if we are wrong 4 05 04 UCB Spring 2004 CS152 Kubiatowicz Hardware Support for Memory Disambiguation Need buffer to keep track of all outstanding stores to memory in program order Keep track of address when becomes available and value when becomes available FIFO ordering will retire stores from this buffer in program order When issuing a load record current head of store queue know which stores are ahead of you When have address for load check store queue If any store prior to load is waiting for its address If not speculating stall load If speculating send request to memory predict no dependence If load address matches earlier store address associative lookup then we have a memory induced RAW hazard store value available return value store value not available return ROB number of source Otherwise send out request to memory Actual stores commit in order so no worry about WAR WAW hazards through memory CS152 Kubiatowicz 4 05 04 UCB Spring 2004 Memory Disambiguation Done FP Op Queue ROB7 Newest ROB6 ROB5 Reorder Buffer LD LD F4 F4 10 R3 10 R3 F2 ST F2 ST 10 R3 10 R3 F5 F5 F0 LD F0 LD F0 32 R2 F0 32 R2 val val 1 1 ST ST 0 R3 0 R3 F4 F4 Registers Dest ROB4 ROB3 ROB2 Oldest ROB1 To Memory Dest Reservation Stations FP FP FPadders adders FPmultipliers multipliers 4 05 04 NN NN NN YY UCB Spring 2004 from Memory Dest 22 32 R2 32 R2 44 ROB3 ROB3 CS152 Kubiatowicz Review Independent Fetch unit Stream of Instructions To Execute Out Of Order Execution Unit Instruction Fetch with Branch Prediction Correctness Feedback On Branch Results Instruction fetch decoupled from execution Often issue logic rename included with Fetch 4 05 04 UCB Spring 2004 CS152 Kubiatowicz Branches must be resolved quickly In our loop unrolling example we relied on the
View Full Document
Unlocking...