Carnegie MellonSchool of Computer Science1ArchitectureData SpeculationAdam Wierman Daniel NeillLipasti and Shen. Exceeding the dataflow limit, 1996.Sodani and Sohi. Understanding the differences between value prediction and instruction reuse, 1998.Carnegie MellonSchool of Computer Science2ArchitectureA Taxonomy of SpeculationSpeculative ExecutionControl Speculation Data SpeculationBranch Direction Branch Target Data Location Data ValueQuestion: What makes speculation possible?What can wespeculate on?Carnegie MellonSchool of Computer Science3ArchitectureValue LocalityQuestion: Where does value locality occur?Single-cycle Arithmetic (i.e. addq $1 $2)Single-cycle Logical (i.e bis $1 $2)Multi-cycle Arithmetic (i.e. mulq $1 $2)Register Move (i.e. cmov $1 $2)Integer Load (i.e. ldq $1 8($2))Store with base register update FP Load FP Multiply FP Add FP MoveSomewhatYesNoYesYesNoYesSomewhatSomewhatYesHow often does the same value result from the same instructiontwice in a rowCarnegie MellonSchool of Computer Science4ArchitectureValue LocalityQuestion: Why is speculation useful?addq $1 $2 $3addq $3 $1 $4addq $3 $2 $5Speculation lets all these run in parallel on a superscalar machineCarnegie MellonSchool of Computer Science5ArchitectureExploiting Value LocalityValue Prediction (VP)Instruction Reuse (IR)“predict the results of instructions based on previously seen results”“recognize that a computation chain hasbeen previously performed and thereforeneed not be performed again”Carnegie MellonSchool of Computer Science6ArchitectureExploiting Value LocalityValue Prediction (VP)Instruction Reuse (IR)Fetch Decode Issue Execute CommitPredictValueVerifyif mispredictedFetch Decode Issue Execute CommitCheck forprevious useVerify argumentsare the sameif reusedCarnegie MellonSchool of Computer Science7ArchitectureValue Prediction(Lipasti & Shen, 1996)Carnegie MellonSchool of Computer Science8ArchitectureValue prediction• Speculative prediction of register values– Values predicted during fetch and dispatch, forwarded to dependent instructions.– Dependent instructions can be issued and executed immediately.– Before committing a dependent instruction, we must verify the predictions. If wrong: must restart dependent instruction w/ correct values.Fetch Decode Issue Execute CommitPredictValueVerifyif mispredictedCarnegie MellonSchool of Computer Science9ArchitecturePCPCPred HistoryValue HistoryClassification Table (CT)Value Prediction Table (VPT)PredictionShould I predict?Predicted ValueOverviewCarnegie MellonSchool of Computer Science10ArchitectureHow to predict values?PCPCPred HistoryValue HistoryClassification Table (CT)Value Prediction Table (VPT)PredictionValue Prediction Table (VPT)– Cache indexed by instruction address (PC)– Mapped to one or more 64-bit values– Values replaced (LRU) when instruction first encountered orwhen prediction incorrect.– 32 KB cache: 4K 8-byte entriesCarnegie MellonSchool of Computer Science11ArchitectureEstimating prediction accuracyPCPCPred HistoryValue HistoryClassification Table (CT)Value Prediction Table (VPT)PredictionPredicted ValueClassification Table (CT)– Cache indexed by instruction address (PC)– Mapped to 2-bit saturating counter, incremented when correct and decremented when wrong. 0,1 = don’t use prediction 2 = use prediction 3 = use prediction and don’t replace value if wrong– 1K entries sufficientCarnegie MellonSchool of Computer Science12ArchitectureVerifying predictions• Predicted instruction executes normally.• Dependent instruction cannot commit until predicted instruction has finished executing.• Computed result compared to predicted; if ok then dependent instructions can commit.• If not, dependent instructions must reissue and execute with computed value. Miss penalty = 1 cycle later than no prediction.Fetch Decode Issue Execute CommitPredictValueVerifyif mispredictedCarnegie MellonSchool of Computer Science13ArchitectureResults• Realistic configuration, on simulated (current and near-future) PowerPC gave 4.5-6.8% speedups.– 3-4x more speedup than devoting extra space to cache.• Speedups vary between benchmarks (grep: 60%)• Potential speedups up to 70% for idealized configurations.– Can exceed dataflow limit (on idealized machine).Carnegie MellonSchool of Computer Science14ArchitectureInstruction Reuse(Sodani & Sohi, 1998)Carnegie MellonSchool of Computer Science15ArchitectureInstruction Reuse• Obtain results of instructions from their previous executions.– If previous results still valid, don’t execute the instruction again, just commit the results!• Non-speculative, early verification– Previous results read in parallel with fetch.– Reuse test in parallel with decode.– Only execute if reuse test fails.Fetch Decode Issue Execute CommitCheck forprevious useVerify argumentsare the sameif reusedCarnegie MellonSchool of Computer Science16ArchitectureHow to reuse instructions?• Reuse buffer– Cache indexed by instruction address (PC)– Stores result of instruction along with info needed for establishing reusability: Operand register names Pointer chain of dependent instructions– Assume 4K entries (each entry takes 4x as much space as VPT: compare to 16K VP)– 4-way set-associative.Carnegie MellonSchool of Computer Science17ArchitectureReuse Scheme• Dependent chain of results (each points to previous instruction in chain)– Entry is reusable if the entries on which it depends have been reused (can’t reuse out of order).– Start of chain: reusable if “valid” bit set; invalidated when operand registers overwritten.– Special handling of loads and stores.• Instruction will not be reused if:– Inputs not ready for reuse test (decode stage)– Different operand registersCarnegie MellonSchool of Computer Science18ArchitectureResults• Attempts to evaluate “realistic” and “comparable” schemes for VPand IR on simulated MIPS architecture.• Are these really realistic? Assume oracle or || test.• Net performance: VP better on some benchmarks; IR better on some. All speedups typically 5-10%.• More interesting question: can the two schemes be combined? • Claim: 84-97% of redundant instructions reusable.Carnegie MellonSchool of Computer Science19ArchitectureComparing VP and IRValue Prediction (VP)Instruction Reuse (IR)“predict the results of instructions based on previously seen results”“recognize that a computation chain hasbeen previously performed and
View Full Document