DOC PREVIEW
Berkeley COMPSCI 252 - Lecture Notes

This preview shows page 1-2-3-19-20-39-40-41 out of 41 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS252 Graduate Computer Architecture Lecture 17: ILP and Dynamic Execution #2: Branch Prediction, Multiple IssueReview TomasuloTomasulo Algorithm and Branch PredictionCase for Branch Prediction when Issue N instructions per clock cycle7 Branch Prediction SchemesDynamic Branch PredictionDynamic Branch Prediction (Jim Smith, 1981)Correlating BranchesAccuracy of Different Schemes (Figure 3.15, p. 257)Re-evaluating CorrelationPredicated ExecutionBHT AccuracyAdministratriviaTournament PredictorsTournament Predictor in Alpha 21264% of predictions from local predictor in Tournament Prediction SchemeAccuracy of Branch PredictionAccuracy v. Size (SPEC89)Need Address at Same Time as PredictionSpecial Case Return AddressesPitfall: Sometimes bigger and dumber is betterDynamic Branch Prediction SummaryGetting CPI < 1: Issuing Multiple Instructions/CycleGetting CPI < 1: Issuing Multiple Instructions/CycleMultiple Issue IssuesMultiple Issue ChallengesDynamic Scheduling in Superscalar The easy wayRegister renaming, virtual registers versus Reorder BuffersHow much to speculate?Limits to ILPSlide 31Upper Limit to ILP: Ideal Machine (Figure 3.34, page 294)More Realistic HW: Branch Impact Figure 3.38, Page 300More Realistic HW: Renaming Register Impact Figure 3.41, Page 304More Realistic HW: Memory Address Alias Impact Figure 3.43, Page 306Realistic HW for ‘00: Window Impact (Figure 3.45, Page 309)How to Exceed ILP Limits of this study?Workstation Microprocessors 3/2001SPEC 2000 Performance 3/2001 Source: Microprocessor Report, www.MPRonline.comIf time permits: "A Language for Describing Predictors and its Application to Automatic Synthesis”, by Emer and GloyConclusionCS252/PattersonLec 17.13/23/01CS252Graduate Computer ArchitectureLecture 17: ILP and Dynamic Execution #2: Branch Prediction, Multiple IssueMarch 23, 2001Prof. David A. PattersonComputer Science 252Spring 2001CS252/PattersonLec 17.23/23/01Review Tomasulo•Reservations stations: implicit register renaming to larger set of registers + buffering source operands–Prevents registers as bottleneck–Avoids WAR, WAW hazards of Scoreboard–Allows loop unrolling in HW•Not limited to basic blocks (integer units gets ahead, beyond branches)•Today, helps cache misses as well–Don’t stall for L1 Data cache miss (insufficient ILP for L2 miss?)•Lasting Contributions–Dynamic scheduling–Register renaming–Load/store disambiguation•360/91 descendants are Pentium III; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264CS252/PattersonLec 17.33/23/01Tomasulo Algorithm and Branch Prediction•360/91 predicted branches, but did not speculate: pipeline stopped until the branch was resolved–No speculation; only instructions that can complete•Speculation with Reorder Buffer allows execution past branch, and then discard if branch fails–just need to hold instructions in buffer until branch can commitCS252/PattersonLec 17.43/23/01Case for Branch Prediction when Issue N instructions per clock cycle1. Branches will arrive up to n times faster in an n-issue processor 2. Amdahl’s Law => relative impact of the control stalls will be larger with the lower potential CPI in an n-issue processorCS252/PattersonLec 17.53/23/017 Branch Prediction Schemes•1-bit Branch-Prediction Buffer•2-bit Branch-Prediction Buffer•Correlating Branch Prediction Buffer•Tournament Branch Predictor•Branch Target Buffer•Integrated Instruction Fetch Units•Return Address PredictorsCS252/PattersonLec 17.63/23/01Dynamic Branch Prediction•Performance = ƒ(accuracy, cost of misprediction)•Branch History Table: Lower bits of PC address index table of 1-bit values–Says whether or not branch taken last time–No address check (saves HW, but may not be right branch)•Problem: in a loop, 1-bit BHT will cause 2 mispredictions (avg is 9 iterations before exit):–End of loop case, when it exits instead of looping as before–First time through loop on next time through code, when it predicts exit instead of looping–Only 80% accuracy even if loop 90% of the timeCS252/PattersonLec 17.73/23/01•Solution: 2-bit scheme where change prediction only if get misprediction twice: (Figure 3.7, p. 249)•Red: stop, not taken•Green: go, taken•Adds hysteresis to decision making processDynamic Branch Prediction(Jim Smith, 1981)TTNTPredict TakenPredict Not TakenPredict TakenPredict Not TakenTNTTNTNTCS252/PattersonLec 17.83/23/01Correlating BranchesIdea: taken/not taken of recently executed branches is related to behavior of next branch (as well as the history of that branch behavior)–Then behavior of recent branches selects between, say, 4 predictions of next branch, updating just that prediction •(2,2) predictor: 2-bit global, 2-bit localBranch address (4 bits)2-bits per branch local predictorsPredictionPrediction2-bit global branch history(01 = not taken then taken)CS252/PattersonLec 17.93/23/010%1%5%6%6%11%4%6%5%1%0%2%4%6%8%10%12%14%16%18%20%4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)Accuracy of Different Schemes(Figure 3.15, p. 257)4096 Entries 2-bit BHTUnlimited Entries 2-bit BHT1024 Entries (2,2) BHT0%18%Frequency of MispredictionsCS252/PattersonLec 17.103/23/01Re-evaluating Correlation•Several of the SPEC benchmarks have less than a dozen branches responsible for 90% of taken branches:program branch % static# = 90%compress 14%236 13eqntott 25%494 5gcc 15%9531 2020mpeg 10%5598 532real gcc 13%17361 3214•Real programs + OS more like gcc•Small benefits beyond benchmarks for correlation? problems with branch aliases?CS252/PattersonLec 17.113/23/01•Avoid branch prediction by turning branches into conditionally executed instructions: if (x) then A = B op C else NOP–If false, then neither store result nor cause exception–Expanded ISA of Alpha, MIPS, PowerPC, SPARC have conditional move; PA-RISC can annul any following instr.–IA-64: 64 1-bit condition fields selected so conditional execution of any instruction–This transformation is called “if-conversion”•Drawbacks to conditional instructions–Still takes a clock even if “annulled”–Stall if condition evaluated late–Complex conditions reduce effectiveness; condition becomes known late in pipelinexA = B op CPredicated ExecutionCS252/PattersonLec 17.123/23/01BHT Accuracy•Mispredict because either:–Wrong guess for that branch–Got branch history of wrong branch when index the table•4096 entry table programs vary


View Full Document

Berkeley COMPSCI 252 - Lecture Notes

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?