Data Speculation Adam Wierman Daniel Neill Lipasti and Shen Exceeding the dataflow limit 1996 Sodani and Sohi Understanding the differences between value prediction and instruction reuse 1998 Carnegie Mellon School of Computer Science Architecture 1 A Taxonomy of Speculation What can we speculate on Speculative Execution Control Speculation Branch Direction Branch Target Data Speculation Data Location Data Value Question What makes speculation possible Carnegie Mellon School of Computer Science Architecture 2 Value Locality How often does the same value result from the same instruction twice in a row Question Where does value locality occur Somewhat Yes No Yes Yes No Yes Somewhat Somewhat Yes Carnegie Mellon School of Computer Science Single cycle Arithmetic i e addq 1 2 Single cycle Logical i e bis 1 2 Multi cycle Arithmetic i e mulq 1 2 Register Move i e cmov 1 2 Integer Load i e ldq 1 8 2 Store with base register update FP Load FP Multiply FP Add FP Move Architecture 3 Value Locality Question Why is speculation useful addq 1 2 3 addq 3 1 4 addq 3 2 5 Speculation lets all these run in parallel on a superscalar machine Carnegie Mellon School of Computer Science Architecture 4 Exploiting Value Locality predict the results of instructions based on previously seen results Value Prediction VP Instruction Reuse IR recognize that a computation chain has been previously performed and therefore need not be performed again Carnegie Mellon School of Computer Science Architecture 5 Exploiting Value Locality Fetch Decode Predict Value Issue Execute if mispredicted Commit Verify Value Prediction VP Instruction Reuse IR Fetch Decode Check for previous use Carnegie Mellon School of Computer Science Issue Execute Verify arguments are the same Commit if reused Architecture 6 Value Prediction Lipasti Shen 1996 Carnegie Mellon School of Computer Science Architecture 7 Value prediction Speculative prediction of register values Values predicted during fetch and dispatch forwarded to dependent instructions Dependent instructions can be issued and executed immediately Before committing a dependent instruction we must verify the predictions If wrong must restart dependent instruction w correct values Fetch Decode Predict Value Carnegie Mellon School of Computer Science Issue Execute if mispredicted Commit Verify Architecture 8 Overview Classification Table CT Value Prediction Table VPT PC Pred History Should I predict PC Value History Predicted Value Prediction Carnegie Mellon School of Computer Science Architecture 9 How to predict values Classification Table CT Value Prediction Table VPT PC Pred History PC Value History Value Prediction Table VPT Cache indexed by instruction address PC Mapped to one or more 64 bit values Values replaced LRU when instruction first encountered or when prediction incorrect 32 KB cache 4K 8 byte entries Prediction Carnegie Mellon School of Computer Science Architecture 10 Estimating prediction accuracy Classification Table CT Value Prediction Table VPT PC Pred History PC Classification Table CT Value History Predicted Value Cache indexed by instruction address PC Mapped to 2 bit saturating counter incremented when correct and decremented when wrong 0 1 don t use prediction 2 use prediction 3 use prediction and don t replace value if wrong 1K entries sufficient Prediction Carnegie Mellon School of Computer Science Architecture 11 Verifying predictions Predicted instruction executes normally Dependent instruction cannot commit until predicted instruction has finished executing Computed result compared to predicted if ok then dependent instructions can commit If not dependent instructions must reissue and execute with computed value Miss penalty 1 cycle later than no prediction Fetch Decode Predict Value Carnegie Mellon School of Computer Science Issue Execute if mispredicted Commit Verify Architecture 12 Results Realistic configuration on simulated current and near future PowerPC gave 4 5 6 8 speedups 3 4x more speedup than devoting extra space to cache Speedups vary between benchmarks grep 60 Potential speedups up to 70 for idealized configurations Can exceed dataflow limit on idealized machine Carnegie Mellon School of Computer Science Architecture 13 Instruction Reuse Sodani Sohi 1998 Carnegie Mellon School of Computer Science Architecture 14 Instruction Reuse Obtain results of instructions from their previous executions If previous results still valid don t execute the instruction again just commit the results Non speculative early verification Previous results read in parallel with fetch Reuse test in parallel with decode Only execute if reuse test fails Fetch Decode Check for previous use Carnegie Mellon School of Computer Science Issue Execute Verify arguments are the same Commit if reused Architecture 15 How to reuse instructions Reuse buffer Cache indexed by instruction address PC Stores result of instruction along with info needed for establishing reusability Operand register names Pointer chain of dependent instructions Assume 4K entries each entry takes 4x as much space as VPT compare to 16K VP 4 way set associative Carnegie Mellon School of Computer Science Architecture 16 Reuse Scheme Dependent chain of results each points to previous instruction in chain Entry is reusable if the entries on which it depends have been reused can t reuse out of order Start of chain reusable if valid bit set invalidated when operand registers overwritten Special handling of loads and stores Instruction will not be reused if Inputs not ready for reuse test decode stage Different operand registers Carnegie Mellon School of Computer Science Architecture 17 Results Attempts to evaluate realistic and comparable schemes for VP and IR on simulated MIPS architecture Are these really realistic Assume oracle or test Net performance VP better on some benchmarks IR better on some All speedups typically 5 10 More interesting question can the two schemes be combined Claim 84 97 of redundant instructions reusable Carnegie Mellon School of Computer Science Architecture 18 Comparing VP and IR predict the results of instructions based on previously seen results Value Prediction VP Instruction Reuse IR recognize that a computation chain has been previously performed and therefore need not be performed again Carnegie Mellon School of Computer Science Architecture 19 Comparing VP and IR IR can t predict when 1 Inputs aren t ready the follows results from of instructions 2 predict Same result
View Full Document
Unlocking...