CS252 Graduate Computer Architecture Lecture 17: ILP and Dynamic Execution #2: Branch Prediction, Multiple IssueReview TomasuloTomasulo Algorithm and Branch PredictionCase for Branch Prediction when Issue N instructions per clock cycle7 Branch Prediction SchemesDynamic Branch PredictionDynamic Branch Prediction (Jim Smith, 1981)Correlating BranchesAccuracy of Different Schemes (Figure 3.15, p. 257)Re-evaluating CorrelationPredicated ExecutionBHT AccuracyAdministratriviaTournament PredictorsTournament Predictor in Alpha 21264% of predictions from local predictor in Tournament Prediction SchemeAccuracy of Branch PredictionAccuracy v. Size (SPEC89)Need Address at Same Time as PredictionSpecial Case Return AddressesPitfall: Sometimes bigger and dumber is betterDynamic Branch Prediction SummaryGetting CPI < 1: Issuing Multiple Instructions/CycleGetting CPI < 1: Issuing Multiple Instructions/CycleMultiple Issue IssuesMultiple Issue ChallengesDynamic Scheduling in Superscalar The easy wayRegister renaming, virtual registers versus Reorder BuffersHow much to speculate?Limits to ILPSlide 31Upper Limit to ILP: Ideal Machine (Figure 3.34, page 294)More Realistic HW: Branch Impact Figure 3.38, Page 300More Realistic HW: Renaming Register Impact Figure 3.41, Page 304More Realistic HW: Memory Address Alias Impact Figure 3.43, Page 306Realistic HW for ‘00: Window Impact (Figure 3.45, Page 309)How to Exceed ILP Limits of this study?Workstation Microprocessors 3/2001SPEC 2000 Performance 3/2001 Source: Microprocessor Report, www.MPRonline.comIf time permits: "A Language for Describing Predictors and its Application to Automatic Synthesis”, by Emer and GloyConclusionCS252/PattersonLec 17.13/23/01CS252Graduate Computer ArchitectureLecture 17: ILP and Dynamic Execution #2: Branch Prediction, Multiple IssueMarch 23, 2001Prof. David A. PattersonComputer Science 252Spring 2001CS252/PattersonLec 17.23/23/01Review Tomasulo•Reservations stations: implicit register renaming to larger set of registers + buffering source operands–Prevents registers as bottleneck–Avoids WAR, WAW hazards of Scoreboard–Allows loop unrolling in HW•Not limited to basic blocks (integer units gets ahead, beyond branches)•Today, helps cache misses as well–Don’t stall for L1 Data cache miss (insufficient ILP for L2 miss?)•Lasting Contributions–Dynamic scheduling–Register renaming–Load/store disambiguation•360/91 descendants are Pentium III; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264CS252/PattersonLec 17.33/23/01Tomasulo Algorithm and Branch Prediction•360/91 predicted branches, but did not speculate: pipeline stopped until the branch was resolved–No speculation; only instructions that can complete•Speculation with Reorder Buffer allows execution past branch, and then discard if branch fails–just need to hold instructions in buffer until branch can commitCS252/PattersonLec 17.43/23/01Case for Branch Prediction when Issue N instructions per clock cycle1. Branches will arrive up to n times faster in an n-issue processor 2. Amdahl’s Law => relative impact of the control stalls will be larger with the lower potential CPI in an n-issue processorCS252/PattersonLec 17.53/23/017 Branch Prediction Schemes•1-bit Branch-Prediction Buffer•2-bit Branch-Prediction Buffer•Correlating Branch Prediction Buffer•Tournament Branch Predictor•Branch Target Buffer•Integrated Instruction Fetch Units•Return Address PredictorsCS252/PattersonLec 17.63/23/01Dynamic Branch Prediction•Performance = ƒ(accuracy, cost of misprediction)•Branch History Table: Lower bits of PC address index table of 1-bit values–Says whether or not branch taken last time–No address check (saves HW, but may not be right branch)•Problem: in a loop, 1-bit BHT will cause 2 mispredictions (avg is 9 iterations before exit):–End of loop case, when it exits instead of looping as before–First time through loop on next time through code, when it predicts exit instead of looping–Only 80% accuracy even if loop 90% of the timeCS252/PattersonLec 17.73/23/01•Solution: 2-bit scheme where change prediction only if get misprediction twice: (Figure 3.7, p. 249)•Red: stop, not taken•Green: go, taken•Adds hysteresis to decision making processDynamic Branch Prediction(Jim Smith, 1981)TTNTPredict TakenPredict Not TakenPredict TakenPredict Not TakenTNTTNTNTCS252/PattersonLec 17.83/23/01Correlating BranchesIdea: taken/not taken of recently executed branches is related to behavior of next branch (as well as the history of that branch behavior)–Then behavior of recent branches selects between, say, 4 predictions of next branch, updating just that prediction •(2,2) predictor: 2-bit global, 2-bit localBranch address (4 bits)2-bits per branch local predictorsPredictionPrediction2-bit global branch history(01 = not taken then taken)CS252/PattersonLec 17.93/23/010%1%5%6%6%11%4%6%5%1%0%2%4%6%8%10%12%14%16%18%20%4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)Accuracy of Different Schemes(Figure 3.15, p. 257)4096 Entries 2-bit BHTUnlimited Entries 2-bit BHT1024 Entries (2,2) BHT0%18%Frequency of MispredictionsCS252/PattersonLec 17.103/23/01Re-evaluating Correlation•Several of the SPEC benchmarks have less than a dozen branches responsible for 90% of taken branches:program branch % static# = 90%compress 14%236 13eqntott 25%494 5gcc 15%9531 2020mpeg 10%5598 532real gcc 13%17361 3214•Real programs + OS more like gcc•Small benefits beyond benchmarks for correlation? problems with branch aliases?CS252/PattersonLec 17.113/23/01•Avoid branch prediction by turning branches into conditionally executed instructions: if (x) then A = B op C else NOP–If false, then neither store result nor cause exception–Expanded ISA of Alpha, MIPS, PowerPC, SPARC have conditional move; PA-RISC can annul any following instr.–IA-64: 64 1-bit condition fields selected so conditional execution of any instruction–This transformation is called “if-conversion”•Drawbacks to conditional instructions–Still takes a clock even if “annulled”–Stall if condition evaluated late–Complex conditions reduce effectiveness; condition becomes known late in pipelinexA = B op CPredicated ExecutionCS252/PattersonLec 17.123/23/01BHT Accuracy•Mispredict because either:–Wrong guess for that branch–Got branch history of wrong branch when index the table•4096 entry table programs vary
View Full Document