DOC PREVIEW
MIT 6 375 - Runahead Processor

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

RunaheadRunaheadProcessorProcessorFinale Finale DoshiDoshiRavi PalakodetyRavi PalakodetyOutlineOutlineMotivationMotivationHigh Level DescriptionHigh Level DescriptionMicroarchitectureMicroarchitectureResultsResultsConclusionsConclusionsWhere We Left Off…Where We Left Off…Lab 3 Lab 3 ––Building a 4Building a 4--stage pipelined stage pipelined SMIPS processorSMIPS processorCritical Path Critical Path ––LoadLoad--ααFetch Fetch ÆÆDecode Decode ÆÆExecute Execute ÆÆReadDataCacheReadDataCacheÆÆWritebackWritebackData Cache Miss?Data Cache Miss?Stall until data returns from Main MemoryStall until data returns from Main MemoryA A BaaadBaaadExampleExampleLdLd--αα, , LdLd--ββ, Ld, Ld--γγ, , ……If latency = 100 cycles from main If latency = 100 cycles from main memory to cache, then:memory to cache, then:Initiate Initiate LdLd--ααrequestrequestStall for 100 cyclesStall for 100 cyclesInitiate Initiate LdLd--ββrequestrequestStall for 100 cyclesStall for 100 cyclesAnd so onAnd so on……Key InsightKey Insight““RunaheadRunahead” to see whether there are ” to see whether there are memory accesses in the near futurememory accesses in the near futureWith an instruction sequence With an instruction sequence LdLd--αα, , LdLd--ββInitiate memory request for Initiate memory request for LdLd--ααContinue executionContinue executionInitiate memory request for Initiate memory request for LdLd--ββOutlineOutlineMotivationMotivationHigh Level DescriptionHigh Level DescriptionMicroarchitectureMicroarchitectureResultsResultsConclusionsConclusionsDataCacheDataCacheMiss Occurs…Miss Occurs…Backup the register fileBackup the register fileKeep running instructionsKeep running instructionsUse Use INVINVas the result of any ops that:as the result of any ops that:Are Are DataCacheDataCachemissesmissesDepend on calculations involving Depend on calculations involving DataCacheDataCachemissesmissesData Returns..Data Returns..Cache is updated from Cache is updated from MainMemMainMem::Restore the register fileRestore the register fileRerun the original “offending” instructionRerun the original “offending” instructionFollow the RulesFollow the RulesDo NOTDo NOTUpdate the Update the DataCacheDataCachewhile in while in RunaheadRunaheadmodemodeInitiate Memory Requests that depend on Initiate Memory Requests that depend on INVINVaddressesaddressesBranch when predicate depends on Branch when predicate depends on INVINVdatadataInitiate Memory Requests that cause Initiate Memory Requests that cause collisions in collisions in DataCacheDataCacheOutlineOutlineMotivationMotivationHigh Level DescriptionHigh Level DescriptionMicroarchitectureMicroarchitectureResultsResultsConclusionsConclusionsProcessor SideProcessor SideCache SideCache SideExecution Execution --Enter Enter RunaheadRunaheadExecution Execution --In In RunaheadRunaheadExecution Execution --Exit Exit RunaheadRunaheadDesign ExplorationsDesign ExplorationsStore Cache OptimizationStore Cache OptimizationDecisions when to exit Decisions when to exit runaheadrunaheadStore CacheStore CacheLdLd--αα, St, St--ββ, Ld, Ld--ββRather than return Rather than return LdLd--ββas as INVINV, return , return the value that was just stored.the value that was just stored.Use 4Use 4--entry table, as in Branch Predictorentry table, as in Branch PredictorWhen to Exit When to Exit RunaheadRunahead??When the “offending” miss returns? ORWhen the “offending” miss returns? ORWhen all memory requests that are When all memory requests that are currently incurrently in--flight are processed?flight are processed?OutlineOutlineMotivationMotivationHigh Level DescriptionHigh Level DescriptionMicroarchitectureMicroarchitectureResultsResultsConclusionsConclusionsKey ParametersKey ParametersVary Latency of Main MemoryVary Latency of Main MemoryAs the latency increases, the impact of As the latency increases, the impact of runaheadrunaheadbecomes more significantbecomes more significantAt small latencies, the penalty for At small latencies, the penalty for entering/exiting entering/exiting runaheadrunaheadcan reduce can reduce performanceperformanceKey ParametersKey ParametersVary Size of Vary Size of FIFOsFIFOsAs the As the FIFOsFIFOsget larger, the processor is get larger, the processor is able to run further ahead and generate more able to run further ahead and generate more parallel memory requests.parallel memory requests.As the As the FIFOsFIFOsget larger, the penalty for get larger, the penalty for exiting exiting runaheadrunaheadbecomes more severe.becomes more severe.Testing StrategyTesting StrategyLatencies of 1, 20, and 100 cyclesLatencies of 1, 20, and 100 cyclesFifosFifosof length 2, 5, 8, 15of length 2, 5, 8, 15Standard benchmarks; focus on Standard benchmarks; focus on vvaddvvaddWe’ll focus on length 15 We’ll focus on length 15 fifosfifoshere since here since they allowed for the most extensive they allowed for the most extensive runaheadrunahead..ResultsResultsResultsResultsResultsResultsOutlineOutlineMotivationMotivationHigh Level DescriptionHigh Level DescriptionMicroarchitectureMicroarchitectureResultsResultsConclusionsConclusionsConclusionsConclusionsRunaheadRunaheadis good.is good.ConclusionsConclusionsRunaheadRunaheadis a cheap and simple way to is a cheap and simple way to improve IPS.improve IPS.The enter/exit The enter/exit runaheadrunaheadpenalty is small penalty is small enough that the IPS is always enough that the IPS is always comparable to the Lab 3 processor.comparable to the Lab 3 processor.The control structure is (fairly) The control structure is (fairly) straightforward, with most improvements straightforward, with most improvements done on the cache side.done on the cache side.ExtensionsExtensionsAggressive Branch PredictionAggressive Branch PredictionDon’t stall when branch predicate is Don’t stall when branch predicate is INVINVSave valid Save valid runaheadrunaheadcomputationscomputationsAggressive Aggressive


View Full Document

MIT 6 375 - Runahead Processor

Documents in this Course
IP Lookup

IP Lookup

15 pages

Verilog 1

Verilog 1

19 pages

Verilog 2

Verilog 2

23 pages

Encoding

Encoding

21 pages

Quiz

Quiz

10 pages

IP Lookup

IP Lookup

30 pages

Load more
Download Runahead Processor
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Runahead Processor and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Runahead Processor 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?