DOC PREVIEW
CMU CS 15740 - Data Speculation

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Carnegie MellonSchool of Computer Science1ArchitectureData SpeculationAdam Wierman Daniel NeillLipasti and Shen. Exceeding the dataflow limit, 1996.Sodani and Sohi. Understanding the differences between value prediction and instruction reuse, 1998.Carnegie MellonSchool of Computer Science2ArchitectureA Taxonomy of SpeculationSpeculative ExecutionControl Speculation Data SpeculationBranch Direction Branch Target Data Location Data ValueQuestion: What makes speculation possible?What can wespeculate on?Carnegie MellonSchool of Computer Science3ArchitectureValue LocalityQuestion: Where does value locality occur?Single-cycle Arithmetic (i.e. addq $1 $2)Single-cycle Logical (i.e bis $1 $2)Multi-cycle Arithmetic (i.e. mulq $1 $2)Register Move (i.e. cmov $1 $2)Integer Load (i.e. ldq $1 8($2))Store with base register update FP Load FP Multiply FP Add FP MoveSomewhatYesNoYesYesNoYesSomewhatSomewhatYesHow often does the same value result from the same instructiontwice in a rowCarnegie MellonSchool of Computer Science4ArchitectureValue LocalityQuestion: Why is speculation useful?addq $1 $2 $3addq $3 $1 $4addq $3 $2 $5Speculation lets all these run in parallel on a superscalar machineCarnegie MellonSchool of Computer Science5ArchitectureExploiting Value LocalityValue Prediction (VP)Instruction Reuse (IR)“predict the results of instructions based on previously seen results”“recognize that a computation chain hasbeen previously performed and thereforeneed not be performed again”Carnegie MellonSchool of Computer Science6ArchitectureExploiting Value LocalityValue Prediction (VP)Instruction Reuse (IR)Fetch Decode Issue Execute CommitPredictValueVerifyif mispredictedFetch Decode Issue Execute CommitCheck forprevious useVerify argumentsare the sameif reusedCarnegie MellonSchool of Computer Science7ArchitectureValue Prediction(Lipasti & Shen, 1996)Carnegie MellonSchool of Computer Science8ArchitectureValue prediction• Speculative prediction of register values– Values predicted during fetch and dispatch, forwarded to dependent instructions.– Dependent instructions can be issued and executed immediately.– Before committing a dependent instruction, we must verify the predictions. If wrong: must restart dependent instruction w/ correct values.Fetch Decode Issue Execute CommitPredictValueVerifyif mispredictedCarnegie MellonSchool of Computer Science9ArchitecturePCPCPred HistoryValue HistoryClassification Table (CT)Value Prediction Table (VPT)PredictionShould I predict?Predicted ValueOverviewCarnegie MellonSchool of Computer Science10ArchitectureHow to predict values?PCPCPred HistoryValue HistoryClassification Table (CT)Value Prediction Table (VPT)PredictionValue Prediction Table (VPT)– Cache indexed by instruction address (PC)– Mapped to one or more 64-bit values– Values replaced (LRU) when instruction first encountered orwhen prediction incorrect.– 32 KB cache: 4K 8-byte entriesCarnegie MellonSchool of Computer Science11ArchitectureEstimating prediction accuracyPCPCPred HistoryValue HistoryClassification Table (CT)Value Prediction Table (VPT)PredictionPredicted ValueClassification Table (CT)– Cache indexed by instruction address (PC)– Mapped to 2-bit saturating counter, incremented when correct and decremented when wrong. 0,1 = don’t use prediction 2 = use prediction 3 = use prediction and don’t replace value if wrong– 1K entries sufficientCarnegie MellonSchool of Computer Science12ArchitectureVerifying predictions• Predicted instruction executes normally.• Dependent instruction cannot commit until predicted instruction has finished executing.• Computed result compared to predicted; if ok then dependent instructions can commit.• If not, dependent instructions must reissue and execute with computed value. Miss penalty = 1 cycle later than no prediction.Fetch Decode Issue Execute CommitPredictValueVerifyif mispredictedCarnegie MellonSchool of Computer Science13ArchitectureResults• Realistic configuration, on simulated (current and near-future) PowerPC gave 4.5-6.8% speedups.– 3-4x more speedup than devoting extra space to cache.• Speedups vary between benchmarks (grep: 60%)• Potential speedups up to 70% for idealized configurations.– Can exceed dataflow limit (on idealized machine).Carnegie MellonSchool of Computer Science14ArchitectureInstruction Reuse(Sodani & Sohi, 1998)Carnegie MellonSchool of Computer Science15ArchitectureInstruction Reuse• Obtain results of instructions from their previous executions.– If previous results still valid, don’t execute the instruction again, just commit the results!• Non-speculative, early verification– Previous results read in parallel with fetch.– Reuse test in parallel with decode.– Only execute if reuse test fails.Fetch Decode Issue Execute CommitCheck forprevious useVerify argumentsare the sameif reusedCarnegie MellonSchool of Computer Science16ArchitectureHow to reuse instructions?• Reuse buffer– Cache indexed by instruction address (PC)– Stores result of instruction along with info needed for establishing reusability: Operand register names Pointer chain of dependent instructions– Assume 4K entries (each entry takes 4x as much space as VPT: compare to 16K VP)– 4-way set-associative.Carnegie MellonSchool of Computer Science17ArchitectureReuse Scheme• Dependent chain of results (each points to previous instruction in chain)– Entry is reusable if the entries on which it depends have been reused (can’t reuse out of order).– Start of chain: reusable if “valid” bit set; invalidated when operand registers overwritten.– Special handling of loads and stores.• Instruction will not be reused if:– Inputs not ready for reuse test (decode stage)– Different operand registersCarnegie MellonSchool of Computer Science18ArchitectureResults• Attempts to evaluate “realistic” and “comparable” schemes for VPand IR on simulated MIPS architecture.• Are these really realistic? Assume oracle or || test.• Net performance: VP better on some benchmarks; IR better on some. All speedups typically 5-10%.• More interesting question: can the two schemes be combined? • Claim: 84-97% of redundant instructions reusable.Carnegie MellonSchool of Computer Science19ArchitectureComparing VP and IRValue Prediction (VP)Instruction Reuse (IR)“predict the results of instructions based on previously seen results”“recognize that a computation chain hasbeen previously performed and


View Full Document

CMU CS 15740 - Data Speculation

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Data Speculation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Speculation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Speculation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?