CMU 18‐447S’10 L12‐1© 2010J. C. HoeJ. F. Martínez18‐447 Lecture 12:Branch PredictionJames C. HoeDept of ECE, CMUFebruary 24, 2010Announcements: 1 week to Spring break!!Lb2d thi kLab2 due this week Lab3 starts next weekHandouts: CMU 18‐447S’10 L12‐2© 2010J. C. HoeJ. F. Martínezt0t1t2t3t4t5t0t1t2t3t4t5t0t1t2t3t4t5t0t1t2t3t4t5Control Speculation: PC+4IFPCInstiInstjInstkInstlInsthIFPCInstiInstjInstkInstlInsthIDIFPC+4first opportunity to decode Insthshould we correct now?IFPC+4IFPCInstiInstjInstkInstlInsthID ALUIDIFPC+8Insthbranch condition and targetIFPC+4IFPCInstiInstjInstkInstlInsthID ALUIDIFPC+8ALUIDIFtargetMEMevaluated in ALUWhen a branch resolves‐ branch target (Instk)is fetched‐ all instructions fetched sinceinsth (so called “wrong‐path”instructions) must be flushedInsthis a branchCMU 18‐447S’10 L12‐3© 2010J. C. HoeJ. F. MartínezPerformance Impact correct guess no penalty ~86% of the time incorrect guess 2 bubbles Assume‐ no data hazards‐ 20% control flow instructions‐ 70% of control flow instructions are taken‐ IPC = 1 / [ 1 + (0.20*0.7) * 2 ] = = 1 / [ 1 + 0.14 * 2 ] = 1 / 1.28 = 0.78penalty fora wrong guessprobability of a wrong guessCan we reduce either of the two penalty terms?CMU 18‐447S’10 L12‐4© 2010J. C. HoeJ. F. MartínezMaking a Better Guess For ALU instructions‐ can’t do better than guessing nextPC=PC+4ill ik i PC bf h‐ still tricky since must guess nextPC before the current instruction is fetched For Branch/Jump instructions‐ why not always guess in the taken direction since 70% correct‐ again, must guess nextPC before the branch instruction is fetched (but branch target is encoded in the instruction)fetched (but branch target is encoded in the instruction) Must make a guess based only on the current fetch PC !!! Fortunately,‐ PC‐offset branch/jump target is static‐ We are allowed to be wrong some of the timeCMU 18‐447S’10 L12‐5© 2010J. C. HoeJ. F. MartínezBranch Target Buffer (Oracle) BTB (Oracle)‐ a giant table indexed by PCreturns the guess for nextPC‐returns the guess for nextPC When encountering a PC for the first time, store in BTB‐ PC + 4 if ALU/LD/ST‐ PC+offset if Branch or Jump‐??if JRBTBPC‐??if JR Effectively guessing branches are always takenIPC = 1 / [ 1 + (0.20*0.3) * 2 ] = 0.89InstructionMemoryCMU 18‐447S’10 L12‐6© 2010J. C. HoeJ. F. MartínezBTB (Reality)BTB idxPCBTB with2NentriesunusedN‐bitnPC “Hash” PC into a 2Nentry table On collision, BTB returns something meaningless and possibly (since 80% of entries all hold PC+4) wrong How big should this table be?CMU 18‐447S’10 L12‐7© 2010J. C. HoeJ. F. MartínezTagged BTBBTB idxtagBTBtagtablePC+41 0nextPC=Only store branch instructions (save 80% storage)Update tag and BTB for the new branch after each collision CMU 18‐447S’10 L12‐8© 2010J. C. HoeJ. F. MartínezEven Better Guess We can get 100% correct on non‐branch instructionsCan we do better than 70% on branch instructions?Can we do better than 70% on branch instructions?‐ We get 90% right on backward branch (dynamic)‐ We only get 50% right on forward branch (dynamic)What pattern can we leverage on forward branches? a given static branch instruction is likely to be highly biased in one direction (either taken or not takenbiased in one direction (either taken or not taken‐ 80~90% correct if we always guessed the same outcome as the last time the same branch was executed‐ IPC = 1 / [ 1 + (0.20*0.15) * 2 ] = 0.94CMU 18‐447S’10 L12‐9© 2010J. C. HoeJ. F. MartínezBranch History Table and Target BufferBTB idxtagBTBN‐bittagtablePC+4BHTtaken?1 0PC+4nextPC=The 1‐bit BHT entry is updated with the true outcome after each ex ecution of a branchCMU 18‐447S’10 L12‐10© 2010J. C. HoeJ. F. MartínezBranch Prediction State Machineactuallytakenpredicttakenpredictnottakenactuallytakenactuallytakenactuallynot takenactuallynot takenCMU 18‐447S’10 L12‐11© 2010J. C. HoeJ. F. Martínez2‐Bit Saturation Counterpred predactuallytakenactually!takentaken11taken10actuallytakenactually!takenactuallytakenpred!taken01pred!taken00actually!takenactually!takenactuallytakenCMU 18‐447S’10 L12‐12© 2010J. C. HoeJ. F. Martínez2‐Bit Hysteresis Counterpredpredactuallytakenactually!taken“weaklytaken”“stronglypredtakenpredtakenactuallytakenactually!takenactually!takenactuallytakenstronglytaken”“strongly!taken”pred!takenpred!takenactually!taken!takenactuallytakenChange prediction after 2 consecutive mistakes“weakly!taken”!takenCMU 18‐447S’10 L12‐13© 2010J. C. HoeJ. F. MartínezState‐Machine‐Based Predictors 2‐bit predictor can get >90% correct ‐ IPC = 1 / [ 1 + (0.20*0.10) * 2 ] = 0.96any“reasonable”2bit predictor does about the same‐any reasonable 2‐bit predictor does about the same Adding more bits to counters does not help much more Major branch behaviors exploited‐ almost always do the same thing again and again (>80%)•1‐bit and 2‐bit predictors equally effective‐ occasionally do the opposite once (5~10%)•2 misprediction with a 1‐bit predictor•2 misprediction with a 1‐bit predictor•1 misprediction with a 2‐bit predictor‐ miscellaneous (<10%)•some could be captured with more elaborate predictors•what does Amdahl’s law say about this? (be careful!!)CMU 18‐447S’10 L12‐14© 2010J. C. HoeJ. F. MartínezPath History Branch outcome can be correlated to other branches Equntott, SPEC92if (aa==2) ;; B1aa=0;if (bb==2) ;; B2bb=0;if (aa!=bb) {;; B3….}}If B1 is not taken (i.e. aa==0@B3) and B2 is not taken (i.e. bb=0@B3) then B3 is certainly takenHow do you capture this information?CMU 18‐447S’10 L12‐15© 2010J. C. HoeJ. F. MartínezGshare Branch Prediction [McFarling]BTB idxNbittagBTBN‐bittagtablePC+4BHTtaken?xorM‐bitBHSR1 0nextPC=Global BHSR (Branch History Shift Register) tracks the outcomes of the last M branch instructionsCMU 18‐447S’10 L12‐16© 2010J. C. HoeJ. F. MartínezReturn Address Stack The targets of register‐indirect jumps have little locality‐ history‐based predictors don’t work‐ but a simple “stack” captures the usage pattern of function call and return very well Return Address Stack (RAS)‐ the return address is pushed when a link instruction (e.g., JAL) is executed‐ when the PC of a return instruction (e.g., JR)
or
We will never post anything without your permission.
Don't have an account? Sign up