Unformatted text preview:

CS152 Computer Architecture and Engineering Lecture 18 Branch Prediction Explicit Renaming ILP April 9 2003 John Kubiatowicz http cs berkeley edu kubitron lecture slides http inst eecs berkeley edu cs152 4 09 03 UCB Spring 2003 CS152 Kubiatowicz Review Tomasulo Organization FP Registers From Mem FP Op Queue Load Buffers Load1 Load2 Load3 Load4 Load5 Load6 Store Buffers Add1 Add2 Add3 Mult1 Mult2 Reservation Stations FP FP FPadders adders FPmultipliers multipliers Common Data Bus CDB 4 09 03 UCB Spring 2003 To Mem CS152 Kubiatowicz Review Three Stages of Tomasulo Algorithm 1 Issue get instruction from FP Op Queue If reservation station free no structural hazard control issues instr sends operands renames registers 2 Execution operate on operands EX When both operands ready then execute if not ready watch Common Data Bus for result 3 Write result finish execution WB Write on Common Data Bus to all awaiting units mark reservation station available Normal data bus data destination go to bus Common data bus data source come from bus 64 bits of data 4 bits of Functional Unit source address Write if matches expected Functional Unit produces result Does the broadcast 4 09 03 UCB Spring 2003 CS152 Kubiatowicz Review Tomasulo Architecture Reservations stations renaming to larger set of registers buffering source operands Prevents registers as bottleneck Avoids WAR WAW hazards of Scoreboard Not limited to basic blocks integer units gets ahead beyond branches Dynamic Scheduling Scoreboarding Tomasulo In order issue out of order execution out of order commit Tomasulo can unroll loops dynamically in hardware Need renaming different physical names for different iterations Fast branch computation 4 09 03 UCB Spring 2003 CS152 Kubiatowicz Review Tomasulo With Reorder buffer ROB Done FP Op Queue val2 val2 F0 F0 val2 val2 F4 F4 M 10 M 10 Reorder Buffer F2 F2 ROB F10 F10 F0 F0 ST ST 0 R3 F0 0 R3 F0 ADDD ADDD F0 F4 F6 F0 F4 F6 LD LD F4 0 R3 F4 0 R3 BNE BNE F2 F2 DIVD DIVD F2 F10 F6 F2 F10 F6 ADDD ADDD F10 F4 F0 F10 F4 F0 LD LD F0 10 R2 F0 10 R2 Registers Dest 22 ADDD ADDD R F4 ROB1 R F4 ROB1 To Memory Dest 33 DIVD DIVD ROB2 R F6 ROB2 R F6 Reservation Stations FP FP FPadders adders FPmultipliers multipliers 4 09 03 YY ROB7 Newest Ex Ex ROB6 YY ROB5 NN ROB5 NN ROB3 NN ROB2 Oldest NN ROB1 UCB Spring 2003 from Memory Dest 11 10 R2 10 R2 CS152 Kubiatowicz Review Four Steps of Speculative Tomasulo Algorithm 1 Issue get instruction from FP Op Queue If reservation station and reorder buffer slot free issue instr send operands reorder buffer no for destination this stage sometimes called dispatch 2 Execution operate on operands EX When both operands ready then execute if not ready watch CDB for result when both in reservation station execute checks RAW sometimes called issue 3 Write result finish execution WB Write on Common Data Bus to all awaiting FUs reorder buffer 4 Commit update register with reorder buffer ROB result When instr at head of reorder buffer result present update register with result or store to memory and remove instr from reorder buffer Stores only commit to memory when reach head of ROB Values only overwrite registers when they reach head Mispredicted branch or interrupt flushes reorder buffer NOTES In order issue Out of order execution In order commit Can always throw out contents of reorder buffer must cancel running ops Precise exception point is instruction at head of buffer 4 09 03 UCB Spring 2003 CS152 Kubiatowicz Review Memory Disambiguation When issuing a load record current head of store queue know which stores are ahead of you When have address for load check store queue If any store prior to load is waiting for its address stall load If load address matches earlier store address associative lookup then we have a memory induced RAW hazard store value available return value store value not available return ROB number of source Otherwise send out request to memory Alternative Dependence speculation Just issue load If discover that you were wrong flush ROB Perhaps stall load Dependence prediction Actual stores commit in order so no worry about WAR WAW hazards through memory 4 09 03 UCB Spring 2003 CS152 Kubiatowicz Review Independent Fetch unit Stream of Instructions To Execute Out Of Order Execution Unit Instruction Fetch with Branch Prediction Correctness Feedback On Branch Results Instruction fetch decoupled from execution Often issue logic rename included with Fetch 4 09 03 UCB Spring 2003 CS152 Kubiatowicz Branches must be resolved quickly In our loop unrolling example we relied on the fact that branches were under control of fast integer unit in order to get overlap Loop LD MULTD SD F4 SUBI R1 BNEZ R1 F0 F4 0 R1 Loop 0 F0 R1 8 R1 F2 What happens if branch depends on result of multd We completely lose all of our advantages Need to be able to predict branch outcome If we were to predict that branch was taken this would be right most of the time Problem much worse for superscalar machines 4 09 03 UCB Spring 2003 CS152 Kubiatowicz Handling some branches Conditional instructions Avoid branch prediction by turning branches into conditionally executed instructions if x then A B op C else NOP If false then neither store result nor cause exception Expanded ISA of Alpha MIPS PowerPC SPARC have conditional move PA RISC can annul any following instr EPIC 64 1 bit condition fields selected so conditional execution Drawbacks to conditional instructions Still takes a clock even if annulled Stall if condition evaluated late Complex conditions reduce effectiveness condition becomes known late in pipeline Cannot loop 4 09 03 UCB Spring 2003 CS152 Kubiatowicz Prediction Branches Dependencies Data Prediction has become essential to getting good performance from scalar instruction streams We will discuss predicting branches However architects are now predicting everything data dependencies actual data and results of groups of instructions At what point does computation become a probabilistic operation verification We are pretty close with control hazards already Why does prediction work Underlying algorithm has regularities Data that is being operated on has regularities Instruction sequence has redundancies that are artifacts of way that humans compilers think about problems Prediction Compressible information streams 4 09 03 UCB Spring 2003 CS152 Kubiatowicz Dynamic Branch Prediction Prediction could be Static at compile time or Dynamic at runtime For our example if we were to statically


View Full Document

Berkeley COMPSCI 152 - Lecture 18 Branch Prediction, Explicit Renaming, ILP

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 18 Branch Prediction, Explicit Renaming, ILP and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 18 Branch Prediction, Explicit Renaming, ILP and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?