1Threshold-BasedMarkovPrefetchersCarlosMarchaniTamerMohamedLerzanCelikkanatGeorgeAbiNaderRiceUniversity,DepartmentofElectricalandComputerEngineeringELEC525,Spring2006AbstractInthisreportwepresentanoveltechniqueforaMarkovprefetcher designed to hide memory latency. This pre-fetchermakesuseoftheinformationaboutpreviouscachemisses in a memory trace and effectively predicts futuremisseswithhighaccuracyandcoveragerate.Theanalysisofthememory tracesoftheexecutedbenchmarksrevealsthat our threshold-based prefetcher poses an optimizedmemory bandwidth overhead while not hurting the pre-fetcher coverage and effectively increasing its accuracy.Ourdataindicatethattheaveragelossincoverageis1%while the gain in accuracy is 13% and the reduction inmemory bandwidth overhead is 39%. The design alsoachieves more than an order of magnitude reduction inmispredictionsascomparedtopreviousresults.KeywordsHidingMemoryLatency,MarkovTablesBACKGROUNDAsmodernsuperscalarprocessorsgethigherclockspeedsanddeeperpipelinesthecostofmemorylatencybecomesanincreasingburdenonthesystemperformance.Prefetch-ersareamechanismthataimstohidethememorylatencyinmodernprocessorsbyfetchingdatafrommemorybeforethe processor actually requests it. The data is typicallyfetched following a model pre-established by the systemdesigneranditisstoredindedicatedbuffers.Severalresearchershaveworkedondifferentmodelsforaprefetcher design, including compiler-dependant prefetch-ers,strideprefetchers,streambuffersandcorrelation-basedprefetchers. Joseph and Grunwald studied Markov pre-fetchersasamodelforcorrelationprefetchers[1].Intheirwork,theyconsideredandsimulatedtheaboveprefetchersinadditiontoaMarkovmodel.TheyfoundthattheMarkovprefetcherprovidedthebestperformancefortheSPEC95applicationsthatwereconsidered.Nonetheless,theyrecog-nizedtheneedtoreducethebandwidthusageandthepos-sibilitytoimprovetheprefetcheraccuracy.The main idea behind utilizing a Markov table in a pre-fetcheristoaccountfortheprobabilitiesoftransitionsfromonestatetoanotherstate.Inthiscontext,astatereferstoacertain missaddress following a givenmissaddress. Thefrequencies of transition are used to populate a table asshown in table 1, and these frequencies are utilized as ameasureof probability forthe prediction of futuretransi-tionsfromareachedstate.Thetableiscontinuouslybeingupdatedandastrideprefetcherisalsousedinordertoen-hancetheperformanceofbenchmarksthatexhibitalotofstridedaccesses.Table 1: A Markov table populated by the transitions andfrequencyoftransitionoccurrenceforthesequenceofstates(cachemisses):A,B,C,D,C,E,A,B,CMissAddress(currentstate)NextMiss(state)AB[2] BC[1] CD[1] E[1] DC[1] EA[1] HYPOTHESISIn[1],theauthorsimplementedaMarkovdesignbasedonthemissoccurrencesoftheL1-Cache;theybuiltaMarkovtreebasedon thenextreportedmisses(states)ofagivenmiss.Infacttheywouldalwaystrackandprefetchnextfourstatesofamiss.Weproposetocreateandsimulateamoredynamicmodelthatadaptstothefrequencyofoccurrenceofnextstates. Theprefetcherwouldonlyprefetchamongthetracked nextstatesthesesmissesthatexceeded inthepastagiventhresholdofappearance.Inouropinion,pre-fetchingmissesbasedonpastfrequencyofoccurrenceus-ingourdynamicthresholdwouldimproveprefetcheraccu-racyandreducethememorybandwidth.Oursolutionassumesthattheprobabilityofoccurrenceofamissiscorrelatedtothehistoryofoccurrenceofthatmiss.Inotherwords,if“missB”occurredafter“missA”halfthetimeinthepastthenithasa50%chanceofoccurringnowandthusisusefultoprefetch.Toconfirmourassumptionweexaminedthedata-L1missstreamofseveralSPEC2000benchmarksandfoundgoodevidencetoourintuition.Figure1showsasampleofthebehaviorthatprovestheviabilityofourhypothesis.Inthe 2VPR benchmark, the frequency of occurrence of missesgrows as theprogram progresses, i.e. the missesthat ap-pearedinthepastarelikelytoappearinthefuture.Theirrelativefrequencyofoccurrenceismorelikelytostaycon-stantorchangeslowlythroughouttheprogram’slifethantochangebehaviordrastically.0246810T6 T7 T8 T9 T107FFF734C7FFF73A0100091DC7FFF75907FFF73C07FFF73207FFF75281001D72C1001DF881001C4D0Figure1:VPRsamplenextmissesfromL1missstreamshowsthenumberofoccurrencesof“next-miss”atprogressivetimeperiodsoftheprogram.Early examination of the
View Full Document