Berkeley COMPSCI 252 - Lecture Notes - D2913803

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 252> Lecture Notes

DOC PREVIEW

Berkeley COMPSCI 252 - Lecture Notes

School name University of California, Berkeley

Course Compsci 252- Graduate Computer Architecture

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Page 1CS252/CullerLec 18. 14/2/02CS252Graduate Computer ArchitectureLecture 18: Branch Prediction + analysis resources => ILPApril 2, 2002Prof. David E. CullerComputer Science 252Spring 2002CS252/CullerLec 18. 24/2/02Today’s Big Idea• Reactive: past actions cause system to adapt use– do what you did before better– ex: caches– TCP windows– URL completion, ...• Proactive: uses past actions to predict future actions– optimize speculatively, anticipate what you are about to do– branch prediction– long cache blocks– ???CS252/CullerLec 18. 34/2/02Review: Case for Branch Prediction when Issue N instructions per clock cycle1. Branches will arrive up to n times faster in an n-issue processor 2. Amdahl’s Law => relative impact of the control stalls will be larger with the lower potential CPI in an n-issue processor conversely, need branch prediction to ‘see’ potential parallelismCS252/CullerLec 18. 44/2/02Review: 7 Branch Prediction Schemes1. 1-bit Branch-Prediction Buffer2. 2-bit Branch-Prediction Buffer3. Correlating Branch Prediction Buffer4. Tournament Branch Predictor5. Branch Target Buffer6. Integrated Instruction Fetch Units7. Return Address PredictorsCS252/CullerLec 18. 54/2/02Review: Dynamic Branch Prediction• Performance = ƒ(accuracy, cost of misprediction)• Branch History Table: Lower bits of PC address index table of 1-bit values– Says whether or not branch taken last time– No address check (saves HW, but may not be right branch)• Problem: in a loop, 1-bit BHT will cause 2 mispredictions (avg is 9 iterations before exit):– End of loop case, when it exits instead of looping as before– First time through loop on next time through code, when it predicts exit instead of looping– Only 80% accuracy even if loop 90% of the timeCS252/CullerLec 18. 64/2/02• Better Solution: 2-bit scheme where change prediction only if get misprediction twice:• Red: stop, not taken• Green: go, taken• Adds hysteresis to decision making processReview: Dynamic Branch Prediction(Jim Smith, 1981)TTNTPredict TakenPredict Not TakenPredict TakenPredict Not TakenTNTTNTNTPage 2CS252/CullerLec 18. 74/2/02Consider 3 Scenarios• Branch for loop test• Check for error or exception• Alternating taken / not-taken– example?• Your worst-case prediction scenarioCS252/CullerLec 18. 84/2/02Correlating BranchesIdea: taken/not taken of recently executed branches is related to behavior of next branch (as well as the history of that branch behavior)– Then behavior of recent branches selects between, say, 4 predictions of next branch, updating just that prediction • (2,2) predictor: 2-bit global, 2-bit localBranch address (4 bits)2-bits per branch local predictorsPredictionPrediction2-bit recent global branch history(01 = not taken then taken)CS252/CullerLec 18. 94/2/020%1%5%6% 6%11%4%6%5%1%0%2%4%6%8%10%12%14%16%18%20%4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)Accuracy of Different Schemes(Figure 3.15, p. 206)4096 Entries 2-bit BHTUnlimited Entries 2-bit BHT1024 Entries (2,2) BHT0%18%Frequency of MispredictionsWhat’s missing in this picture?CS252/CullerLec 18. 104/2/02Re-evaluating Correlation• Several of the SPEC benchmarks have less than a dozen branches responsible for 90% of taken branches:program branch % static # = 90%compress 14% 236 13eqntott 25% 494 5gcc 15% 9531 2020mpeg 10% 5598 532real gcc 13% 17361 3214• Real programs + OS more like gcc• Small benefits beyond benchmarks for correlation? problems with branch aliases?CS252/CullerLec 18. 114/2/02BHT Accuracy• Mispredict because either:– Wrong guess for that branch– Got branch history of wrong branch when index the table• 4096 entry table programs vary from 1% misprediction (nasa7, tomcatv) to 18% (eqntott), with spice at 9% and gcc at 12%• For SPEC92,4096 about as good as infinite tableCS252/CullerLec 18. 124/2/02Tournament Predictors• Motivation for correlating branch predictors is 2-bit predictor failed on important branches; by adding global information, performance improved• Tournament predictors: use 2 predictors, 1 based on global information and 1 based on local information, and combine with a selector• Hopes to select right predictor for right branch (or right context of branch)Page 3CS252/CullerLec 18. 134/2/02Dynamically finding structure in Spaghetti?CS252/CullerLec 18. 144/2/02Tournament Predictor in Alpha 21264• 4K 2-bit counters to choose from among a global predictor and a local predictor• Global predictor also has 4K entries and is indexed by the history of the last 12 branches; each entry in the global predictor is a standard 2-bit predictor– 12-bit pattern: ith bit 0 => ith prior branch not taken; ith bit 1 => ith prior branch taken; • Local predictor consists of a 2-level predictor: – Top level a local history table consisting of 1024 10-bit entries; each 10-bit entry corresponds to the most recent 10 branch outcomes for the entry. 10-bit history allows patterns 10 branches to be discovered and predicted. – Next level Selected entry from the local history table is used to index a table of 1K entries consisting a 3-bit saturating counters, which provide the local prediction• Total size: 4K*2 + 4K*2 + 1K*10 + 1K*3 = 29K bits!(~180,000 transistors)CS252/CullerLec 18. 154/2/02% of predictions from local predictor in Tournament Prediction Scheme98%100%94%90%55%76%72%63%37%69%0% 20% 40% 60% 80% 100%nasa7matrix300tomcatvdoducspicefppppgccespressoeqntottliCS252/CullerLec 18. 164/2/0294%96%98%98%97%100%70%82%77%82%84%99%88%86%88%86%95%99%0% 20% 40% 60% 80% 100%gccespressolifppppdoductomcatvBranch prediction accuracyProfile-based2-bit counterTournamentAccuracy of Branch Prediction• Profile: branch profile from last execution(static in that in encoded in instruction, but profile)fig 3.40CS252/CullerLec 18. 174/2/02Accuracy v. Size (SPEC89)0%1%2%3%4%5%6%7%8%9%10%0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128Total predictor size (Kbits)LocalCorrelatingTournamentCS252/CullerLec 18. 184/2/02Need Address at Same Time as Prediction• Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken)– Note: must check for branch match now, since can’t use wrong branch address (Figure 3.19, 3.20)Branch PC Predicted PC=?PC of instructionFETCHExtra prediction statebitsYes: instruction is branch and use predicted PC as next PCNo: branch not predicted, proceed normally(Next PC =

View Full Document

Berkeley COMPSCI 252 - Lecture Notes

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 7 pages.

Berkeley COMPSCI 252 - Lecture Notes

Sign up for free to view:

Please select your school