DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 14 - Advanced Superscalars

This preview shows page 1-2-3-22-23-24-45-46-47 out of 47 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 152 Computer Architecture and Engineering Lecture 14 - Advanced SuperscalarsLast time in Lecture 13Dynamic Branch Prediction learning based on past behaviorBranch Prediction BitsBranch History TableExploiting Spatial Correlation Yeh and Patt, 1992Two-Level Branch PredictorLimitations of BHTsBranch Target BufferAddress CollisionsBTB is only for Control InstructionsBranch Target Buffer (BTB)Consulting BTB Before DecodingCombining BTB and BHTUses of Jump Register (JR)Subroutine Return StackMispredict RecoveryIn-Order Commit for Precise ExceptionsBranch Misprediction in PipelineRecovering ROB/Renaming TableSpeculating Both DirectionsCS152 Administrivia“Data in ROB” Design (HP PA8000, Pentium Pro, Core2Duo)Unified Physical Register File (MIPS R10K, Alpha 21264, Pentium 4)PowerPoint PresentationLifetime of Physical RegistersPhysical Register ManagementSlide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Reorder Buffer Holds Active Instruction WindowSuperscalar Register RenamingSlide 37Memory DependenciesIn-Order Memory QueueConservative O-o-O Load ExecutionAddress SpeculationMemory Dependence Prediction (Alpha 21264)Speculative Loads / StoresSpeculative Store BufferSlide 45Slide 46AcknowledgementsCS 152 Computer Architecture and Engineering Lecture 14 - Advanced SuperscalarsKrste AsanovicElectrical Engineering and Computer SciencesUniversity of California at Berkeleyhttp://www.eecs.berkeley.edu/~krstehttp://inst.eecs.berkeley.edu/~cs1523/17/2009 CS152-Spring’092Last time in Lecture 13•Register renaming removes WAR, WAW hazards•Instruction execution divided into four major stages:–Instruction Fetch, Decode/Rename, Execute/Complete, Commit•Control hazards are serious impediment to superscalar performance•Dynamic branch predictors can be quite accurate (>95%) and avoid most control hazards•Branch History Tables (BHTs) just predict direction (later in pipeline)–Just need a few bits per entry (2 bits gives hysteresis)–Need to decode instruction bits to determine whether this is a branch and what the target address is3/17/2009 CS152-Spring’093Dynamic Branch Predictionlearning based on past behaviorTemporal correlationThe way a branch resolves may be a good predictor of the way it will resolve at the next executionSpatial correlation Several branches may resolve in a highly correlated manner (a preferred path of execution)3/17/2009 CS152-Spring’094• Assume 2 BP bits per instruction• Change the prediction after two consecutive mistakes!¬takewrongtaken¬ takentakentakentaken¬takerighttakerighttakewrong¬ taken¬ taken¬ takenBP state:(predict take/¬take) x (last prediction right/wrong)Branch Prediction Bits3/17/2009 CS152-Spring’095Branch History Table4K-entry BHT, 2 bits/entry, ~80-90% correct predictions0 0Fetch PCBranch?Target PC+I-CacheOpcode offsetInstructionkBHT Index2k-entryBHT,2 bits/entryTaken/¬Taken?3/17/2009 CS152-Spring’096Exploiting Spatial CorrelationYeh and Patt, 1992History register, H, records the direction of the last N branches executed by the processorif (x[i] < 7) theny += 1;if (x[i] < 5) thenc -= 4;If first condition false, second condition also false3/17/2009 CS152-Spring’097Two-Level Branch PredictorPentium Pro uses the result from the last two branchesto select one of the four sets of BHT bits (~95% correct)0 0kFetch PCShift in Taken/¬Taken results of each branch2-bit global branch history shift registerTaken/¬Taken?3/17/2009 CS152-Spring’098Limitations of BHTsOnly predicts branch direction. Therefore, cannot redirect fetch stream until after branch target is determined.UltraSPARC-III fetch pipelineCorrectly predicted taken branch penaltyJump Register penaltyA PC Generation/MuxP Instruction Fetch Stage 1F Instruction Fetch Stage 2B Branch Address Calc/Begin DecodeI Complete DecodeJ Steer Instructions to Functional unitsR Register File ReadE Integer ExecuteRemainder of execute pipeline (+ another 6 stages)3/17/2009 CS152-Spring’099Branch Target BufferBP bits are stored with the predicted target address.IF stage: If (BP=taken) then nPC=target else nPC=PC+4later: check prediction, if wrong then kill the instruction and update BTB & BPb else update BPbIMEMPCBranch Target Buffer (2k entries)kBPbpredictedtarget BP target3/17/2009 CS152-Spring’0910Address CollisionsWhat will be fetched after the instruction at 1028?BTB prediction = Correct target =Assume a 128-entry BTBBPbtargettake2361028 Add .....132 Jump 100InstructionMemoryIs this a common occurrence?Can we avoid these bubbles?3/17/2009 CS152-Spring’0911BTB is only for Control InstructionsBTB contains useful information for branch and jump instructions only Do not update it for other instructionsFor all other instructions the next PC is PC+4 !How to achieve this effect without decoding the instruction?3/17/2009 CS152-Spring’0912Branch Target Buffer (BTB)• Keep both the branch PC and target PC in the BTB • PC+4 is fetched if match fails• Only taken branches and jumps held in BTB• Next PC determined before branch fetched and decoded2k-entry direct-mapped BTB(can also be associative)I-CachePCkValidvalidEntry PC=matchpredictedtargettarget PC3/17/2009 CS152-Spring’0913Consulting BTB Before Decoding1028 Add .....132 Jump 100BPbtargettake236entry PC132• The match for PC=1028 fails and 1028+4 is fetched  eliminates false predictions after ALU instructions• BTB contains entries only for control transfer instructions more room to store branch targets3/17/2009 CS152-Spring’0914Combining BTB and BHT•BTB entries are considerably more expensive than BHT, but can redirect fetches at earlier stage in pipeline and can accelerate indirect branches (JR)•BHT can hold many more entries and is more accurateA PC Generation/MuxP Instruction Fetch Stage 1F Instruction Fetch Stage 2B Branch Address Calc/Begin DecodeI Complete DecodeJ Steer Instructions to Functional unitsR Register File ReadE Integer ExecuteBTBBHTBHT in later pipeline stage corrects when BTB misses a predicted taken branchBTB/BHT only updated after branch resolves in E stage3/17/2009 CS152-Spring’0915Uses of Jump Register (JR)•Switch statements (jump to address of matching case)•Dynamic function call (jump to run-time function address)•Subroutine returns (jump to return address)How well does BTB work for each of these cases?3/17/2009 CS152-Spring’0916Subroutine Return StackSmall structure to accelerate JR for subroutine returns, typically much more accurate than


View Full Document

Berkeley COMPSCI 152 - Lecture 14 - Advanced Superscalars

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 14 - Advanced Superscalars
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 14 - Advanced Superscalars and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 14 - Advanced Superscalars 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?