CS 152 Computer Architecture andEngineering Lecture 14 - Branch PredictionKrste AsanovicElectrical Engineering and Computer SciencesUniversity of California at Berkeleyhttp://www.eecs.berkeley.edu/~krstehttp://inst.eecs.berkeley.e du/~cs1523/20/2008 CS152-Spring!082Last time in Lecture 13• Register renaming removes WAR, WAW hazards bygiving a new internal destination register for everynew result• Pipeline is structured with in-orderfetch/decode/rename, followed by out-of-orderexecution/complete, followed by in-order commit• At commit time, can detect exceptions and roll backbuffer to provide precise interrupts3/20/2008 CS152-Spring!083Recap: Overall Pipeline Structure• Instructions fetched and decoded into instruction reorder buffer in-order• Execution is out-of-order ( ! out-of-order completion)• Commit (write-back to architectural state, i.e., regfile & memory) is in-orderTemporary storage needed to hold results before commit(shadow registers and store buffers)Fetch DecodeExecuteCommitReorder BufferIn-order In-orderOut-of-orderKillKillKillException?Inject handler PC3/20/2008 CS152-Spring!084I-cacheFetchBufferIssueBufferFunc.UnitsArch.StateExecuteDecodeResultBufferCommitPCFetchBranchexecutedNext fetchstartedModern processors may have> 10 pipeline stages betweennext PC calculation and branchresolution !Control Flow PenaltyHow much work is lost ifpipeline doesn’t followcorrect instruction flow?~ Loop length x pipeline width3/20/2008 CS152-Spring!085Instruction Taken known? Target known?JJRBEQZ/BNEZMIPS Branches and JumpsEach instruction fetch depends on one or two piecesof information from the preceding instruction:1) Is the preceding instruction a taken branch?2) If so, what is the target address?After Inst. DecodeAfter Inst. Decode After Inst. DecodeAfter Inst. Decode After Reg. FetchAfter Reg. Fetch**Assuming zero detect on register read3/20/2008 CS152-Spring!086Branch Penalties in Modern PipelinesA PC Generation/MuxP Instruction Fetch Stage 1F Instruction Fetch Stage 2B Branch Address Calc/Begin DecodeI Complete DecodeJ Steer Instructions to Functional unitsR Register File ReadE Integer ExecuteRemainder of execute pipeline (+ another 6 stages)UltraSPARC-III instruction fetch pipeline stages(in-order issue, 4-way superscalar, 750MHz, 2000)BranchTargetAddressKnownBranchDirection &JumpRegisterTargetKnown3/20/2008 CS152-Spring!087Reducing Control Flow PenaltySoftware solutions• Eliminate branches - loop unrollingIncreases the run length• Reduce resolution time - instruction schedulingCompute the branch condition as earlyas possible (of limited value)Hardware solutions• Find something else to do - delay slotsReplaces pipeline bubbles with useful work(requires software cooperation)• Speculate - branch predictionSpeculative execution of instructions beyondthe branch3/20/2008 CS152-Spring!088Branch PredictionMotivation:Branch penalties limit performance of deeply pipelinedprocessorsModern branch predictors have high accuracy(>95%) and can reduce branch penalties significantlyRequired hardware support:Prediction structures:• Branch history tables, branch target buffers, etc.Mispredict recovery mechanisms:• Keep result computation separate from commit• Kill instructions following branch in pipeline• Restore state to state following branch3/20/2008 CS152-Spring!089Static Branch PredictionOverall probability a branch is taken is ~60-70% but:ISA can attach preferred direction semantics to branches,e.g., Motorola MC88110bne0 (preferred taken) beq0 (not taken)ISA can allow arbitrary choice of statically predicteddirection, e.g., HP PA-RISC, Intel IA-64 typically reported as ~80% accurateJZJZbackward90%forward50%3/20/2008 CS152-Spring!0810Dynamic Branch Predictionlearning based on past behaviorTemporal correlationThe way a branch resolves may be a goodpredictor of the way it will resolve at the nextexecutionSpatial correlationSeveral branches may resolve in a highlycorrelated manner (a preferred path ofexecution)3/20/2008 CS152-Spring!0811• Assume 2 BP bits per instruction• Change the prediction after two consecutive mistakes!¬takewrongtaken¬ takentakentakentaken¬takerighttakerighttakewrong¬ taken¬ taken¬ takenBP state:(predict take/¬take) x (last prediction right/wrong)Branch Prediction Bits3/20/2008 CS152-Spring!0812Branch History Table4K-entry BHT, 2 bits/entry, ~80-90% correct predictions0 0Fetch PCBranch?Target PC+I-CacheOpcode offsetInstructionkBHT Index2k-entryBHT,2 bits/entryTaken/¬Taken?3/20/2008 CS152-Spring!0813Exploiting Spatial CorrelationYeh and Patt, 1992History register, H, records the direction of the lastN branches executed by the processorif (x[i] < 7) theny += 1;if (x[i] < 5) thenc -= 4;If first condition false, second condition also false3/20/2008 CS152-Spring!0814Two-Level Branch PredictorPentium Pro uses the result from the last two branchesto select one of the four sets of BHT bits (~95% correct)0 0kFetch PCShift inTaken/¬Takenresults of eachbranch2-bit global branchhistory shift registerTaken/¬Taken?3/20/2008 CS152-Spring!0815Limitations of BHTsOnly predicts branch direction. Therefore, cannot redirectfetch stream until after branch target is determined.UltraSPARC-III fetch pipelineCorrectlypredictedtaken branchpenaltyJump RegisterpenaltyA PC Generation/MuxP Instruction Fetch Stage 1F Instruction Fetch Stage 2B Branch Address Calc/Begin DecodeI Complete DecodeJ Steer Instructions to Functional unitsR Register File ReadE Integer ExecuteRemainder of execute pipeline(+ another 6 stages)3/20/2008 CS152-Spring!0816Branch Target BufferBP bits are stored with the predicted target address.IF stage: If (BP=taken) then nPC=target else nPC=PC+4later: check prediction, if wrong then kill the instruction and update BTB & BPb else update BPbIMEMPCBranch Target Buffer (2k entries)kBPbpredictedtarget BP target3/20/2008 CS152-Spring!0817Address CollisionsWhat will be fetched after the instruction at 1028?BTB prediction = Correct target =! Assume a 128-entry BTBBPbtargettake2361028 Add .....132 Jump 100InstructionMemory2361032kill PC=236 and fetch PC=1032Is this a common occurrence?Can we avoid these bubbles?3/20/2008 CS152-Spring!0818BTB is only for Control InstructionsBTB contains useful information for branch andjump instructions only! Do not update it for other instructionsFor all other instructions the next PC is PC+4 !How to achieve this effect without decoding theinstruction?3/20/2008
View Full Document