Unformatted text preview:

Review Tomasulo Organization CS152 Computer Architecture and Engineering Lecture 17 Branch Prediction Explicit Renaming ILP FP Registers FP Op Queue Load Buffers From Mem Load1 Load2 Load3 Load4 Load5 Load6 Store Buffers April 5 2004 Add1 Add2 Add3 John Kubiatowicz http cs berkeley edu kubitron Mult1 Mult2 FP FP adders adders Reservation Stations To Mem FP FP multipliers multipliers lecture slides http inst eecs berkeley edu cs152 Common Data Bus CDB 4 05 04 Reservations stations 1 Issue get instruction from FP Op Queue If reservation station free no structural hazard control issues instr sends operands renames registers 2 Execution operate on operands EX integer units gets ahead beyond branches 3 Write result finish execution WB Dynamic Scheduling Write on Common Data Bus to all awaiting units mark reservation station available go to bus Common data bus data source come from bus 64 bits of data 4 bits of Functional Unit source address Write if matches expected Functional Unit produces result Does the broadcast 4 05 04 UCB Spring 2004 renaming to larger set of registers buffering source operands Prevents registers as bottleneck Avoids WAR WAW hazards of Scoreboard Not limited to basic blocks When both operands ready then execute if not ready watch Common Data Bus for result data destination CS152 Kubiatowicz Lec17 2 Review Tomasulo Architecture Review Three Stages of Tomasulo Algorithm Normal data bus UCB Spring 2004 CS152 Kubiatowicz Lec17 3 Scoreboarding Tomasulo In order issue out of order execution out of order commit Tomasulo can unroll loops dynamically in hardware Need renaming different physical names for different iterations Fast branch computation 4 05 04 UCB Spring 2004 CS152 Kubiatowicz Lec17 4 Review Four Steps of Speculative Tomasulo Algorithm Review Tomasulo With Reorder buffer ROB FP Op Queue Reorder Buffer ROB val2 val2 F0 F0 val2 val2 F4 M 10 F4 M 10 F2 F2 F10 F10 F0 F0 Done ST 0 R3 F0 YY ROB7 ST 0 R3 F0 ADDD ADDD F0 F4 F6 F0 F4 F6 Ex Ex ROB6 LD F4 0 R3 Y LD F4 0 R3 Y ROB5 BNE F2 N N ROB5 BNE F2 DIVD F2 F10 F6 N DIVD F2 F10 F6 N ROB3 ADDD ADDD F10 F4 F0 F10 F4 F0 NN ROB2 LD NN ROB1 LD F0 10 R2 F0 10 R2 1 Issue get instruction from FP Op Queue Newest If reservation station and reorder buffer slot free issue instr send operands reorder buffer no for destination this stage sometimes called dispatch 2 Execution operate on operands EX When both operands ready then execute if not ready watch CDB for result when both in reservation station execute checks RAW sometimes called issue Oldest 3 Write result finish execution WB Write on Common Data Bus to all awaiting FUs reorder buffer 4 Commit update register with reorder buffer ROB result Registers Dest 22 ADDD ADDD R F4 ROB1 R F4 ROB1 FP FP adders adders Dest 33 DIVD DIVD ROB2 R F6 ROB2 R F6 Reservation Stations To Memory When instr at head of reorder buffer result present update register with result or store to memory and remove instr from reorder buffer from Memory Values only overwrite registers when they reach head Stores only commit to memory when reach head of ROB Mispredicted branch or interrupt flushes reorder buffer Dest 11 10 R2 10 R2 NOTES In order issue Out of order execution In order commit FP FP multipliers multipliers Can always throw out contents of reorder buffer must cancel running ops Precise exception point is instruction at head of buffer 4 05 04 UCB Spring 2004 CS152 Kubiatowicz Lec17 5 Tomasulo With Reorder buffer Memory Disambiguation FP Op Queue Reorder Buffer What about memory hazards M 10 M 10 F0 F0 F4 M 10 F4 M 10 F2 F2 F10 F10 F0 F0 Done ST 0 R3 F4 YY ROB7 ST 0 R3 F4 ADDD ADDD F0 F4 F6 F0 F4 F6 Ex Ex ROB6 LD F4 0 R3 Y LD F4 0 R3 Y ROB5 BNE F2 N N ROB4 BNE F2 DIVD F2 F10 F6 N DIVD F2 F10 F6 N ROB3 ADDD ADDD F10 F4 F0 F10 F4 F0 NN ROB2 LD NN ROB1 LD F0 10 R2 F0 10 R2 Registers Dest 22 ADDD ADDD R F4 ROB1 R F4 ROB1 FP FP adders adders 4 05 04 Newest 4 05 04 Reservation Stations Question Given a load that follows a store in program order are the two related Alternatively is there a RAW hazard between the store and the load Eg Oldest from Memory 0 R2 R5 R6 0 R3 Store address could be delayed for a long time by some calculation that leads to R2 divide We might want to issue begin execution of both operations in same cycle Two techiques No Speculation we are not allowed to start load until we know for sure that address 0 R2 0 R3 Speculation We might guess at whether or not they are dependent called dependence speculation and use reorder buffer to fixup if we are wrong FP FP multipliers multipliers CS152 Kubiatowicz Lec17 7 st ld Can we go ahead and start the load early Dest 11 10 R2 10 R2 UCB Spring 2004 CS152 Kubiatowicz Lec17 6 Memory Disambiguation Handling RAW Hazards in memory To Memory Dest 33 DIVD DIVD ROB2 R F6 ROB2 R F6 UCB Spring 2004 4 05 04 UCB Spring 2004 CS152 Kubiatowicz Lec17 8 Hardware Support for Memory Disambiguation Need buffer to keep track of all outstanding stores to memory in program order Keep track of address when becomes available and value when becomes available FIFO ordering will retire stores from this buffer in program order When issuing a load record current head of store queue know which stores are ahead of you Memory Disambiguation Done FP Op Queue ROB7 ROB5 Reorder Buffer When have address for load check store queue If any store prior to load is waiting for its address If not speculating stall load If speculating send request to memory predict no dependence If load address matches earlier store address associative lookup then we have a memory induced RAW hazard store value available return value store value not available return ROB number of source Otherwise send out request to memory CS152 Kubiatowicz Lec17 9 UCB Spring 2004 Dest Reservation Stations 4 05 04 ROB4 ROB3 ROB2 Oldest ROB1 To Memory from Memory Dest FP FP adders adders NN NN NN YY FP FP multipliers multipliers Dest 22 32 R2 32 R2 44 ROB3 ROB3 CS152 Kubiatowicz Lec17 10 UCB Spring 2004 Branches must be resolved quickly Review Independent Fetch unit In our loop unrolling example we relied on the fact that branches were under control of fast integer unit in order to get overlap Stream of Instructions To Execute Out Of Order Execution Unit Instruction Fetch with Branch Prediction LD LD F4 F4 10 R3 10 R3 F2 ST 10 R3 F2 ST 10 R3 F5 F5 F0 LD F0 LD F0 32 R2 F0 32 R2 val 1 1 ST val ST 0 R3 0 R3 F4 F4 Registers Actual stores commit in order so no worry about WAR WAW hazards through memory …


View Full Document

Berkeley COMPSCI 152 - Branch Prediction, Explicit Renaming, ILP

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Branch Prediction, Explicit Renaming, ILP and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Branch Prediction, Explicit Renaming, ILP and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?