DOC PREVIEW
Branch Prediction

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMU 18‐447S’10 L12‐1© 2010J. C. HoeJ. F. Martínez18‐447 Lecture 12:Branch PredictionJames C. HoeDept of ECE, CMUFebruary 24, 2010Announcements: 1 week to Spring break!!Lb2d thi kLab2 due this week Lab3 starts next weekHandouts: CMU 18‐447S’10 L12‐2© 2010J. C. HoeJ. F. Martínezt0t1t2t3t4t5t0t1t2t3t4t5t0t1t2t3t4t5t0t1t2t3t4t5Control Speculation: PC+4IFPCInstiInstjInstkInstlInsthIFPCInstiInstjInstkInstlInsthIDIFPC+4first opportunity to decode Insthshould we correct now?IFPC+4IFPCInstiInstjInstkInstlInsthID ALUIDIFPC+8Insthbranch condition and targetIFPC+4IFPCInstiInstjInstkInstlInsthID ALUIDIFPC+8ALUIDIFtargetMEMevaluated in ALUWhen a branch resolves‐ branch target (Instk)is fetched‐ all instructions fetched sinceinsth (so called “wrong‐path”instructions) must be flushedInsthis a branchCMU 18‐447S’10 L12‐3© 2010J. C. HoeJ. F. MartínezPerformance Impact correct guess  no penalty ~86% of the time incorrect guess  2 bubbles Assume‐ no data hazards‐ 20% control flow instructions‐ 70% of control flow instructions are taken‐ IPC = 1 / [ 1 + (0.20*0.7) * 2 ] = = 1 / [ 1 + 0.14 * 2 ] = 1 / 1.28 = 0.78penalty fora wrong guessprobability of a wrong guessCan we reduce either of the two penalty terms?CMU 18‐447S’10 L12‐4© 2010J. C. HoeJ. F. MartínezMaking a Better Guess For ALU instructions‐ can’t do better than guessing nextPC=PC+4ill ik i PC bf h‐ still tricky since must guess nextPC before the current instruction is fetched For Branch/Jump instructions‐ why not always guess in the taken direction since 70% correct‐ again, must guess nextPC before the branch instruction is fetched (but branch target is encoded in the instruction)fetched (but branch target is encoded in the instruction) Must make a guess based only on the current fetch PC !!! Fortunately,‐ PC‐offset branch/jump target is static‐ We are allowed to be wrong some of the timeCMU 18‐447S’10 L12‐5© 2010J. C. HoeJ. F. MartínezBranch Target Buffer (Oracle) BTB (Oracle)‐ a giant table indexed by PCreturns the guess for nextPC‐returns the guess for nextPC When encountering a PC for the first time, store in BTB‐ PC + 4 if ALU/LD/ST‐ PC+offset if Branch or Jump‐??if JRBTBPC‐??if JR Effectively guessing branches are always takenIPC = 1 / [ 1 + (0.20*0.3) * 2 ] = 0.89InstructionMemoryCMU 18‐447S’10 L12‐6© 2010J. C. HoeJ. F. MartínezBTB (Reality)BTB idxPCBTB with2NentriesunusedN‐bitnPC “Hash” PC into a 2Nentry table On collision, BTB returns something meaningless and possibly (since 80% of entries all hold PC+4) wrong How big should this table be?CMU 18‐447S’10 L12‐7© 2010J. C. HoeJ. F. MartínezTagged BTBBTB idxtagBTBtagtablePC+41 0nextPC=Only store branch instructions (save 80% storage)Update tag and BTB for the new branch after each collision CMU 18‐447S’10 L12‐8© 2010J. C. HoeJ. F. MartínezEven Better Guess We can get 100% correct on non‐branch instructionsCan we do better than 70% on branch instructions?Can we do better than 70% on branch instructions?‐ We get 90% right on backward branch (dynamic)‐ We only get 50% right on forward branch (dynamic)What pattern can we leverage on forward branches? a given static branch instruction is likely to be highly biased in one direction (either taken or not takenbiased in one direction (either taken or not taken‐ 80~90% correct if we always guessed the same outcome as the last time the same branch was executed‐ IPC = 1 / [ 1 + (0.20*0.15) * 2 ] = 0.94CMU 18‐447S’10 L12‐9© 2010J. C. HoeJ. F. MartínezBranch History Table and Target BufferBTB idxtagBTBN‐bittagtablePC+4BHTtaken?1 0PC+4nextPC=The 1‐bit BHT entry is updated with the true outcome after each ex ecution of a branchCMU 18‐447S’10 L12‐10© 2010J. C. HoeJ. F. MartínezBranch Prediction State Machineactuallytakenpredicttakenpredictnottakenactuallytakenactuallytakenactuallynot takenactuallynot takenCMU 18‐447S’10 L12‐11© 2010J. C. HoeJ. F. Martínez2‐Bit Saturation Counterpred predactuallytakenactually!takentaken11taken10actuallytakenactually!takenactuallytakenpred!taken01pred!taken00actually!takenactually!takenactuallytakenCMU 18‐447S’10 L12‐12© 2010J. C. HoeJ. F. Martínez2‐Bit Hysteresis Counterpredpredactuallytakenactually!taken“weaklytaken”“stronglypredtakenpredtakenactuallytakenactually!takenactually!takenactuallytakenstronglytaken”“strongly!taken”pred!takenpred!takenactually!taken!takenactuallytakenChange prediction after 2 consecutive mistakes“weakly!taken”!takenCMU 18‐447S’10 L12‐13© 2010J. C. HoeJ. F. MartínezState‐Machine‐Based Predictors 2‐bit predictor can get >90% correct ‐ IPC = 1 / [ 1 + (0.20*0.10) * 2 ] = 0.96any“reasonable”2bit predictor does about the same‐any reasonable 2‐bit predictor does about the same Adding more bits to counters does not help much more Major branch behaviors exploited‐ almost always do the same thing again and again (>80%)•1‐bit and 2‐bit predictors equally effective‐ occasionally do the opposite once (5~10%)•2 misprediction with a 1‐bit predictor•2 misprediction with a 1‐bit predictor•1 misprediction with a 2‐bit predictor‐ miscellaneous (<10%)•some could be captured with more elaborate predictors•what does Amdahl’s law say about this? (be careful!!)CMU 18‐447S’10 L12‐14© 2010J. C. HoeJ. F. MartínezPath History Branch outcome can be correlated to other branches Equntott, SPEC92if (aa==2) ;; B1aa=0;if (bb==2) ;; B2bb=0;if (aa!=bb) {;; B3….}}If B1 is not taken (i.e. aa==0@B3) and B2 is not taken (i.e. bb=0@B3) then B3 is certainly takenHow do you capture this information?CMU 18‐447S’10 L12‐15© 2010J. C. HoeJ. F. MartínezGshare Branch Prediction [McFarling]BTB idxNbittagBTBN‐bittagtablePC+4BHTtaken?xorM‐bitBHSR1 0nextPC=Global BHSR (Branch History Shift Register) tracks the outcomes of the last M branch instructionsCMU 18‐447S’10 L12‐16© 2010J. C. HoeJ. F. MartínezReturn Address Stack The targets of register‐indirect jumps have little locality‐ history‐based predictors don’t work‐ but a simple “stack” captures the usage pattern of function call and return very well Return Address Stack (RAS)‐ the return address is pushed when a link instruction (e.g., JAL) is executed‐ when the PC of a return instruction (e.g., JR)


Branch Prediction

Download Branch Prediction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Branch Prediction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Branch Prediction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?