CS152 Computer Architecture and Engineering Lecture 19 Finish speculation Branch Prediction November 7th 2001 John Kubiatowicz http cs berkeley edu kubitron lecture slides http www inst eecs berkeley edu cs152 11 06 01 UCB Fall 2001 CS152 Kubiatowicz Review Tomasulo Organization FP Registers From Mem FP Op Queue Load Buffers Load1 Load2 Load3 Load4 Load5 Load6 Store Buffers Add1 Add2 Add3 Mult1 Mult2 Reservation Stations FP FP FPadders adders FPmultipliers multipliers Common Data Bus CDB 11 07 01 UCB Fall 2001 To Mem CS152 Kubiatowicz Review Tomasulo Architecture Reservations stations renaming to larger set of registers buffering source operands Prevents registers as bottleneck Avoids WAR WAW hazards of Scoreboard Allows loop unrolling in HW Not limited to basic blocks integer units gets ahead beyond branches Dynamic Scheduling Scoreboarding Tomasulo In order issue out of order execution out of order commit Branch prediction speculation Regularities in program execution permit prediction of branch directions and data values Necessary for wide superscalar issue 11 07 01 UCB Fall 2001 CS152 Kubiatowicz Recall Prediction Branches Dependencies Data Prediction has become essential to getting good performance from scalar instruction streams We will discuss predicting branches However architects are now predicting everything data dependencies actual data and results of groups of instructions At what point does computation become a probabilistic operation verification We are pretty close with control hazards already Why does prediction work Underlying algorithm has regularities Data that is being operated on has regularities Instruction sequence has redundancies that are artifacts of way that humans compilers think about problems Prediction Compressible information streams 11 07 01 UCB Fall 2001 CS152 Kubiatowicz Review Independent Fetch unit Stream of Instructions To Execute Out Of Order Execution Unit Instruction Fetch with Branch Prediction Correctness Feedback On Branch Results Instruction fetch decoupled from execution Often issue logic rename included with Fetch 11 07 01 UCB Fall 2001 CS152 Kubiatowicz Review Branches must be resolved quickly In our loop unrolling example we relied on the fact that branches were under control of fast integer unit in order to get overlap Loop LD MULTD SD F4 SUBI R1 BNEZ R1 F0 F4 0 R1 Loop 0 F0 R1 8 R1 F2 What happens if branch depends on result of multd We completely lose all of our advantages Need to be able to predict branch outcome If we were to predict that branch was taken this would be right most of the time Problem much worse for superscalar machines 11 07 01 UCB Fall 2001 CS152 Kubiatowicz Dynamic Branch Prediction Prediction could be Static at compile time or Dynamic at runtime For our example if we were to statically predict taken we would only be wrong once each pass through loop Is dynamic branch prediction better than static branch prediction Seems to be Still some debate to this effect Today lots of hardware being devoted to dynamic branch predictors Does branch prediction make sense for 5 stage inorder pipeline What about 8 stage pipeline Perhaps eliminate branch delay slots Then predict branches 11 07 01 UCB Fall 2001 CS152 Kubiatowicz Simple dynamic prediction Branch Target Buffer BTB Address of branch index to get prediction AND branch address if taken Must check for branch match now since can t use wrong branch address Grab predicted PC from table since may take several cycles to compute PC of instruction FETCH Branch PC Predicted PC Predict taken or untaken Update predicted PC when branch is actually resolved Return instruction addresses predicted with stack 11 07 01 UCB Fall 2001 CS152 Kubiatowicz Branch History Table Predictor 0 Predictor 1 Branch PC Predictor 7 BHT is a table of Predictors Usually 2 bit saturating counters Indexed by PC address of Branch without tags In Fetch state of branch BTB identifies branch Predictor from BHT used to make prediction When branch completes Update corresponding Predictor 11 07 01 UCB Fall 2001 CS152 Kubiatowicz Dynamic Branch Prediction Branch History Table Lower bits of PC address prediction table Branch Target Buffer BTB identify branches and hold taken addresses Trick identify branch before fetching instruction Branch History Table makes prediction Simplest possibility could be something much more complicated No address check Can be good can be bad Simple 1 bit BHT keep last direction of branch Problem in a loop 1 bit BHT will cause two mispredictions avg is 9 iteratios before exit End of loop case when it exits instead of looping as before First time through loop on next time through code when it predicts exit instead of looping Performance accuracy cost of misprediction Misprediction Flush Reorder Buffer 11 07 01 UCB Fall 2001 CS152 Kubiatowicz Dynamic Branch Prediction Solution 2 bit scheme where change prediction only if get misprediction twice Figure 4 13 p 264 T Predict Taken NT NT T NT Predict Not Taken Predict Taken T Predict Not Taken T NT Red stop not taken Green go taken Adds hysteresis to decision making process 11 07 01 UCB Fall 2001 CS152 Kubiatowicz BHT Accuracy Mispredict because either Wrong guess for that branch Got branch history of wrong branch when index the table 4096 entry table programs vary from 1 misprediction nasa7 tomcatv to 18 eqntott with spice at 9 and gcc at 12 4096 about as good as infinite table in Alpha 211164 11 07 01 UCB Fall 2001 CS152 Kubiatowicz Correlating Branches Hypothesis behavior of recently executed branches affects prediction of current branch Two possibilities Current branch depends on Last m most recently executed branches anywhere in program Produces a GA for global address in the Yeh and Patt classification e g GAg Last m most recent outcomes of same branch Produces a PA for per address in same classification e g PAg Idea record m most recently executed branches as taken or not taken and use that pattern to select the proper branch history table entry A single history table shared by all branches appends a g at end indexed by history value Address is used along with history to select table entry appends a p at end of classification If only portion of address used often appends an s to indicate setindexed tables I e GAs 11 07 01 UCB Fall 2001 CS152 Kubiatowicz Correlating Branches For instance consider global history set indexed BHT That gives us a GAs history table 2 2 GAs predictor Branch address First 2 means that we keep two bits
View Full Document
Unlocking...