Outline CMSC 411 Computer Systems Architecture Lecture 9 Instruction Level Parallelism 3 Static Dynamic Branch Prediction ILP Compiler techniques to increase ILP Loop Unrolling Static Branch Prediction Dynamic Branch Prediction Overcoming Data Hazards with Dynamic Scheduling Tomasulo Algorithm Conclusion 2 CMSC 411 8 from Patterson Static Branch Prediction Dynamic Branch Prediction Previously scheduled code around delayed branch To reorder code around branches Need to predict branch statically during compile Simplest scheme is to predict a branch as taken Average misprediction untaken branch frequency 34 SPEC92 22 15 H P Figure 2 3 18 20 15 12 11 12 9 10 4 5 10 6 dr o2 d m dl jd p su 2c or c ea r hy do du c gc li m pr es s eq nt ot es t pr es so 0 co More More accurate accurate scheme scheme predicts predicts branches branches using using profile profile information information collected collected from from earlier earlier runs runs and and modify modify prediction prediction based based on on last last run run Misprediction Rate 25 Why does prediction work Underlying algorithm has regularities Data that is being operated on has regularities Instruction sequence has redundancies that are artifacts of way that humans compilers think about problems Is dynamic branch prediction better than static branch prediction Seems to be There are a small number of important branches in programs that have dynamic behavior Integer Floating Point CMSC 411 8 from Patterson 3 Dynamic Branch Prediction Dynamic Branch Prediction Performance accuracy cost of misprediction Branch History Table BHT table of 1 bit values indexed by lower bits of PC address index Says whether or not branch taken last time No address check may refer to wrong branch T Predict Taken 1 Solution 2 bit prediction scheme where predictor changes prediction only if it mispredicts twice in a row T Predict Taken NT NT T 0 CS252 S05 3 T Predict Not Taken Predict Not Taken Problem in a loop 1 bit BHT will cause two mispredictions avg is 9 loop iterations before exit End of loop when it exits instead of looping as before First time through loop on next time through code when it predicts exit instead of looping CMSC 411 8 from Patterson 4 CMSC 411 8 from Patterson 5 1 H P Figure 2 4 NT T NT T Predict Taken 2 NT 0 Predict Not Taken NT Red stop not taken Green go taken Adds hysteresis to decision making process CMSC 411 8 from Patterson 6 Correlated Branch Prediction Mispredict because either Wrong guess for that branch Got branch history of wrong branch when indexing into the table H P Figure 2 5 20 18 4096 18 16 entry 14 12 12 10 table 9 9 9 Idea record m most recently executed branches as taken or not taken and use that pattern to select the proper n bit branch history table In general m n predictor means record last m branches to select between 2m history tables each with n bit counters Thus old 2 bit BHT is a 0 2 predictor Global Branch History m bit shift register keeping T NT status of last m branches ifif aa aa 2 2 Each entry in table has 2m n bit predictors aa aa 0 0 Also known as 2 level adaptive predictor 5 1 Integer 7 ifif bb bb 2 2 na sa fp pp m p at rix 30 0 c sp ice do du c sp ice 0 gc Floating Point 7 CMSC 411 8 from Patterson Correlating Branches Possible choices Local history branch address Global branch history branch address Global branch history only no branch address Branch Branch address address Ignores branch instruction Prediction Branch address 1 Index into Predictor 0 Global Global branch branch history history 1 Global branch history 2 bits 2 bits per per branch branch predictor predictor CMSC 411 8 from Patterson Predictor Local branch history 9 Accuracy of Different Schemes 16 14 12 11 10 8 6 1 6 5 4 1 0 4 096 entries 2 bits per entry 11 6 Unlimited entries 2 bits entry CMSC 411 8 from Patterson li 2 0 SPEC89 CMSC 411 8 from Patterson 6 5 4 eqntott How to use the same bits w a 2 2 predictor 8k bits w 2 bit BHT means 4k BHTs the 2 2 implies an entry has four BHTs 1k entries i e a 2 2 predictor w 1024 entries H P Figure 2 7 4096 Entries 2 bit BHT Unlimited Entries 2 bit BHT 1024 Entries 2 2 BHT 18 expresso Frequency of Mispredictions 20 gcc 4096 entry 0 2 predictor i e 2 bit BHT 4k x 2 8k bits 4k 212 12 address bits fpppp Calculations CS252 S05 01 10110 0 spice Or Or 44 addr addr bits bits 22 history history bits bits give give us us 6 bit 6 bit index index 6 into into 226 64 64 predictors predictors each each having having two two bits bits 128 128 total total bits bits 4 doducd Behavior of recent branches selects between four predictions of next branch updating just that prediction Correlated Branch Prediction tomcatv 8 CMSC 411 8 from Patterson matrix300 2 2 predictor w bb bb 0 0 ifif aa aa bb bb Depends on 2 previous branches nasa7 SPEC89 5 li 10 8 6 4 2 0 eq nt ot es t pr es so Misprediction Rate BHT Accuracy 1 024 entries 2 2 12 Tournament Predictors N bit Saturating Counter Multilevel branch predictor Use n bit saturating counter to choose between predictors Usually choice is between global and local predictors 0 0 3 1 2 2 1 T taken NT not taken 3 Used to choose between predictors X Y N bit counter value between 0 and 2n 1 Counter operations Increment by 1 up to 2n 1 If X is correct Y is incorrect Decrement by 1 down to 0 Predictor 1 correct Predictor 2 incorrect If Y is correct X is incorrect Choose predictor X if counter 2n 1 Y otherwise Can be used as predictor X taken Y not taken 13 CMSC 411 8 from Patterson Tournament Predictor DEC Alpha 21264 Tournament predictor using 4K 2 bit counters indexed by local branch address Chooses between Global predictor 12 4K entries indexed by history of last 12 branches 212 4K 8K Each entry is a standard 2 bit predictor Local predictor Local history table 1K 10 bit entries recording last 10K 10 branches index by branch address The pattern of the last 10 occurrences of that 3K particular branch used to index table of 1K entries with 3 bit saturating counters 8K Total size of predictor 8K 8K 10K 3K 29K 0 2 Predictor T Predict Taken 1 Branch 1 NT NT 0 T Predict Not Taken Branch 2 Iteration Predictor Prediction Action Predictor Prediction Action 1 2 3 4 5 Exit loop 0 1 0 1 0 1 NT T NT T NT 0 1 1 1 1 0 NT T T T T T NT T NT T T T T T NT Prediction based on state of …
View Full Document