CS 152 Computer Architecture and Engineering Lecture 14 Branch Prediction Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http www eecs berkeley edu krste http inst eecs berkeley edu cs152 Last time in Lecture 13 Register renaming removes WAR WAW hazards by giving a new internal destination register for every new result Pipeline is structured with in order fetch decode rename followed by out of order execution complete followed by in order commit At commit time can detect exceptions and roll back buffer to provide precise interrupts 3 20 2008 CS152 Spring 08 2 Recap Overall Pipeline Structure In order Fetch Out of order Commit Reorder Buffer Decode Kill In order Kill Kill Execute Inject handler PC Exception Instructions fetched and decoded into instruction reorder buffer in order Execution is out of order out of order completion Commit write back to architectural state i e regfile memory is in order Temporary storage needed to hold results before commit shadow registers and store buffers 3 20 2008 3 CS152 Spring 08 Control Flow Penalty Next fetch started PC I cache Modern processors may have 10 pipeline stages between next PC calculation and branch resolution Fetch Buffer Fetch Decode Issue Buffer How much work is lost if pipeline doesn t follow correct instruction flow Func Units Execute Loop length x pipeline width Branch executed Result Buffer Commit Arch State 3 20 2008 CS152 Spring 08 4 MIPS Branches and Jumps Each instruction fetch depends on one or two pieces of information from the preceding instruction 1 Is the preceding instruction a taken branch 2 If so what is the target address Instruction Taken known Target known J After Inst Decode After Inst Decode JR After Inst Decode After Reg Fetch BEQZ BNEZ After Reg Fetch After Inst Decode Assuming 3 20 2008 zero detect on register read CS152 Spring 08 5 Branch Penalties in Modern Pipelines UltraSPARC III instruction fetch pipeline stages in order issue 4 way superscalar 750MHz 2000 Branch Target Address Known Branch Direction Jump Register Target Known 3 20 2008 A PC Generation Mux P Instruction Fetch Stage 1 F Instruction Fetch Stage 2 B Branch Address Calc Begin Decode I Complete Decode J Steer Instructions to Functional units R Register File Read E Integer Execute Remainder of execute pipeline another 6 stages CS152 Spring 08 6 Reducing Control Flow Penalty Software solutions Eliminate branches loop unrolling Increases the run length Reduce resolution time instruction scheduling Compute the branch condition as early as possible of limited value Hardware solutions Find something else to do delay slots Replaces pipeline bubbles with useful work requires software cooperation Speculate branch prediction Speculative execution of instructions beyond the branch 3 20 2008 CS152 Spring 08 7 Branch Prediction Motivation Branch penalties limit performance of deeply pipelined processors Modern branch predictors have high accuracy 95 and can reduce branch penalties significantly Required hardware support Prediction structures Branch history tables branch target buffers etc Mispredict recovery mechanisms Keep result computation separate from commit Kill instructions following branch in pipeline Restore state to state following branch 3 20 2008 CS152 Spring 08 8 Static Branch Prediction Overall probability a branch is taken is 60 70 but backward 90 forward 50 JZ JZ ISA can attach preferred direction semantics to branches e g Motorola MC88110 bne0 preferred taken beq0 not taken ISA can allow arbitrary choice of statically predicted direction e g HP PA RISC Intel IA 64 typically reported as 80 accurate 3 20 2008 CS152 Spring 08 9 Dynamic Branch Prediction learning based on past behavior Temporal correlation The way a branch resolves may be a good predictor of the way it will resolve at the next execution Spatial correlation Several branches may resolve in a highly correlated manner a preferred path of execution 3 20 2008 CS152 Spring 08 10 Branch Prediction Bits Assume 2 BP bits per instruction Change the prediction after two consecutive mistakes taken taken take right taken take wrong taken taken take right taken taken take wrong taken BP state predict take take x last prediction right wrong 3 20 2008 11 CS152 Spring 08 Branch History Table Fetch PC 00 k I Cache BHT Index 2k entry BHT 2 bits entry Instruction Opcode offset Branch Target PC Taken Taken 4K entry BHT 2 bits entry 80 90 correct predictions 3 20 2008 CS152 Spring 08 12 Exploiting Spatial Correlation Yeh and Patt 1992 if x i y if x i c 7 then 1 5 then 4 If first condition false second condition also false History register H records the direction of the last N branches executed by the processor 3 20 2008 13 CS152 Spring 08 Two Level Branch Predictor Pentium Pro uses the result from the last two branches to select one of the four sets of BHT bits 95 correct 00 Fetch PC k 2 bit global branch history shift register Shift in Taken Taken results of each branch Taken Taken 3 20 2008 CS152 Spring 08 14 Limitations of BHTs Only predicts branch direction Therefore cannot redirect fetch stream until after branch target is determined Correctly predicted taken branch penalty Jump Register penalty A PC Generation Mux P Instruction Fetch Stage 1 F Instruction Fetch Stage 2 B Branch Address Calc Begin Decode I Complete Decode J Steer Instructions to Functional units R Register File Read E Integer Execute Remainder of execute pipeline another 6 stages UltraSPARC III fetch pipeline 3 20 2008 15 CS152 Spring 08 Branch Target Buffer predicted target BPb Branch Target Buffer 2k entries IMEM k PC target BP BP bits are stored with the predicted target address IF stage If BP taken then nPC target else nPC PC 4 later check prediction if wrong then kill the instruction and update BTB BPb else update BPb 3 20 2008 CS152 Spring 08 16 Address Collisions 132 Jump 100 Assume a 128 entry BTB 1028 Add target 236 BPb take Instruction What will be fetched after the instruction at 1028 Memory BTB prediction Correct target 236 1032 kill PC 236 and fetch PC 1032 Is this a common occurrence Can we avoid these bubbles 3 20 2008 CS152 Spring 08 17 BTB is only for Control Instructions BTB contains useful information for branch and jump instructions only Do not update it for other instructions For all other instructions the next PC is PC 4 How to achieve this effect without decoding the instruction 3 20 2008 CS152 Spring 08 18 Branch Target Buffer BTB I Cache 2k entry
View Full Document
Unlocking...