DOC PREVIEW
UMD CMSC 411 - Lecture 9 Instruction Level Parallelism 3

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS252 S05CMSC 411Computer Systems ArchitectureLecture 9Instruction Level Parallelism 3(Static & Dynamic Branch Prediction)CMSC 41 1 - 8 (fro m Patter son)Outline• ILP• Compiler techniques to increase ILP• Loop Unrolling• Static Branch Prediction• Dynamic Branch Prediction• Overcoming Data Hazards with Dynamic Scheduling• Tomasulo Algorithm• Conclusion212%22%18%11%12%4%6%9%10%15%0%5%10%15%20%25%compresseqntottespressogcclidoducearhydro2dmdljdpsu2corMisprediction RateCMSC 41 1 - 8 (fro m Patter son)Static Branch Prediction• Previously scheduled code around delayed branch• To reorder code around branches– Need to predict branch statically during compile • Simplest scheme is to predict a branch as taken– Average misprediction = untaken branch frequency = 34% SPEC92More accurate scheme predicts branches using profile information collected from earlier runs, and modify prediction based on last run:More accurate scheme predicts branches using profile information collected from earlier runs, and modify prediction based on last run:Integer Floating PointH&P Figure 2.33CMSC 41 1 - 8 (fro m Patter son)Dynamic Branch Prediction• Why does prediction work?– Underlying algorithm has regularities– Data that is being operated on has regularities– Instruction sequence has redundancies that are artifacts of way that humans/compilers think about problems• Is dynamic branch prediction better than static branch prediction?– Seems to be – There are a small number of important branches in programs that have dynamic behavior4CMSC 41 1 - 8 (fro m Patter son)Dynamic Branch Prediction• Performance = ƒ(accuracy, cost of misprediction)• Branch History Table (BHT): table of 1-bit values indexed by lower bits of PC address index– Says whether or not branch taken last time– No address check (may refer to wrong branch)• Problem: in a loop, 1-bit BHT will cause two mispredictions (avg is 9 loop iterations before exit):– End of loop, when it exits instead of looping as before– First time through loop on next time through code, when it predicts exit instead of looping51 0TNTPredict TakenPredict Not TakenNTTCMSC 41 1 - 8 (fro m Patter son)• Solution: 2-bit prediction scheme where predictor changes prediction only if it mispredicts twice in a row• Red: stop, not taken• Green: go, taken• Adds hysteresis to decision making processDynamic Branch PredictionH&P Figure 2.46TTNTNTPredict TakenPredict Not TakenPredict TakenPredict Not TakenTNTTNT231 0CS252 S05CMSC 41 1 - 8 (fro m Patter son)BHT Accuracy• Mispredict because either:– Wrong guess for that branch– Got branch history of wrong branch when indexing into the table• 4096 entry table:18%5%12%10%9%5%9% 9%0%1%0%2%4%6%8%10%12%14%16%18%20%eqntottespressogcclispicedoducspicefppppmatrix300nasa7Misprediction RateInteger Floating PointSPEC89H&P Figure 2.57CMSC 41 1 - 8 (fro m Patter son)Correlated Branch Prediction• Idea – record m most recently executed branches as taken or not taken, and use that pattern to select the proper n-bit branch history table• In general, (m,n) predictor means record last mbranches to select between 2mhistory tables, each with n-bit counters– Thus, old 2-bit BHT is a (0,2) predictor– Global Branch History: m-bit shift register keeping T/NT status of last m branches.– Each entry in table has 2mn-bit predictors• Also known as 2-level adaptive predictorif (aa == 2)aa = 0;if (bb == 2)bb = 0;if (aa != bb) {if (aa == 2)aa = 0;if (bb == 2)bb = 0;if (aa != bb) {8Depends on 2 previous branches!CMSC 41 1 - 8 (fro m Patter son)Correlating Branches(2,2) predictor w/ – Behavior of recent branches selects between four predictions of next branch, updating just that predictionBranch addressBranch address2-bits per branch predictor2-bits per branch predictorPrediction1 0Or, 4 addr bits + 2 history bits give us 6-bit index into 26= 64 predictors, each having two bits 128 total bits.Or, 4 addr bits + 2 history bits give us 6-bit index into 26= 64 predictors, each having two bits 128 total bits.Global branch historyGlobal branch history49Correlated Branch Prediction• Possible choices– Local history + branch address– Global branch history + branch address– Global branch history only (no branch address)» Ignores branch instruction0110110Branch addressBranch address1 0Global branch historyGlobal branch historyLocal branch historyLocal branch historyPredictorPredictorIndex into PredictorIndex into PredictorCMSC 41 1 - 8 (fro m Patter son)Calculations• 4096-entry (0,2) predictor (i.e., 2-bit BHT)– 4k x 2 = 8k bits– 4k = 212 12 address bits• How to use the same # bits w/ a (2,2) predictor?– 8k bits w/ 2-bit BHT means 4k BHTs– the (2, 2) implies an entry has four BHTs 1k entries, i.e. a (2,2) predictor w/ 1024 entries11CMSC 41 1 - 8 (fro m Patter son)expresso0%Frequency of Mispredictions0%1%5%6% 6%11%4%6%5%1%2%4%6%8%10%12%14%16%18%20%4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)Accuracy of Different Schemes4096 Entries 2-bit BHTUnlimited Entries 2-bit BHTnasa7matrix300doducdspicefppppgcceqntottlitomcatv1024 Entries (2,2) BHTSPEC89H&P Figure 2.712CS252 S05CMSC 41 1 - 8 (fro m Patter son)Tournament Predictors• Multilevel branch predictor• Use n-bit saturating counter to choose between predictors• Usually choice isbetween globaland local predictorsPredictor 1 correct.Predictor 2 incorrect.133201N-bit Saturating Counter• Used to choose between predictors X & Y • N-bit counter value between 0 and 2n-1• Counter operations– Increment by 1 (up to 2n-1)» If X is correct & Y is incorrect– Decrement by 1 (down to 0)» If Y is correct & X is incorrect• Choose predictor X if counter > 2n-1, Y otherwise• Can be used as predictor (X = taken, Y = not taken)T = takenNT = not taken012 3CMSC 41 1 - 8 (fro m Patter son)Tournament Predictor : DEC Alpha 21264• Tournament predictor using 4K 2-bit counters indexed by local branch address. Chooses between:• Global predictor– 4K entries indexed by history of last 12 branches (212= 4K)– Each entry is a standard 2-bit predictor• Local predictor– Local history table: 1K 10-bit entries recording last 10 branches, index by branch address– The pattern of the last 10 occurrences of that particular branch used to index table of 1K entries with 3-bit saturating counters15Total size of predictor = 8K + 8K + 10K + 3K = 29K8K8K10K3K12(0,1)


View Full Document

UMD CMSC 411 - Lecture 9 Instruction Level Parallelism 3

Documents in this Course
Load more
Download Lecture 9 Instruction Level Parallelism 3
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 9 Instruction Level Parallelism 3 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 9 Instruction Level Parallelism 3 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?