Rice ELEC 525 - Towards a More Efficient Trace Cache

Unformatted text preview:

Towards a More Efficient Trace CacheAmit SahaJerry YenRajnish KumarELEC/COMP 525 April 24, 2001Motivationn Exploiting ILPn Current limitations of instruction fetch mechanismsFrom: Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching by Rotenberg, et al. 1996HypothesisTrace cache implemented by:n Giving weights to entries based on past use and future usage prediction (branch prediction) andn Using the weights for the line fill and replacement buffer logicwill enhance processor performanceArchitecturen Trace Cache n 1024 or 32 entriesn Max 3 blocks per entryn Max 16 instruction per entryBranch Predictorn Two Level Adaptive Branch PredictorKk-2k-1KSelectSelectTertiary BranchPredictionSecondaryBranch PredictionPrimary BranchPrediction2Global HistoryRegisterPattern HistoryTableWeight Parametersn Number of basic blocksn Non-contiguity of the linen Zero-count in branch-prediction valuesn Expected future usen 2-bit hit countern Active-window-size fieldImplementationn Separate fields for different parametersn Total weight of trace cache line is sum ofn Basic_block_count weightn Branch prediction values mapped to weightsn Number of hits in last x number of cyclesn x is active_window_size.Redundancy in Trace-Cachen Line-fill-buffer logic changed :n If a block is the point of multiple entry, like B here, start a new trace cache line with B.Implementation Example :[ABC] Ă  [ABC, DE][ABC, DE, BCD][BCD]Methodologyn Baseline n Increased execution resourcesn Baseline with TC n Baseline with modified TCn Unmodified Trace Cache n LRU replacement policyIdeal casePossible IPC Improvement with 1024 Entry Trace Cache00.511.522.5ammp mcf* vpr meanSPEC2000 BenchmarkIPCBaselineTC(1024)Small sized trace cacheIPC Improvement With 64 Entry Trace Cache00.511.522.5ammp mcf* vpr meanSPEC2000 BenchmarkIPCBaselineTC(1024)TC(64)./tests/ benchmarks ideal casePossible IPC Improvement with 1024 Entry Trace Cache00.511.522.5anagram test-fmath test-llong test-lswlr test-math test-printf meanTest BenchmarkIPCBaselineTC(1024)Small sized trace cacheIPC Improvement with Trace Cache00.511.522.5anagram test-fmath test-llong test-lswlr test-math test-printf meanTest BenchmarkIPCBaselineTC(1024)TC(32)Various weights usedIPC Improvement with Trace Cache Using Various Weights00.20.40.60.811.21.4anagram test-fmath test-llong test-lswlr test-math test-printf meanTest BenchmarkIPCTC(32)TC(32) + Distance WeightTC(32) + WeightsModified lfb logic IPC Improvement with Modified Line-Fill Buffer Logic00.20.40.60.811.21.4anagram test-fmath test-llong test-lswlr test-math test-printf meanTest BenchmarkIPCTC(32)+WeightsTC(32)+Weights+ModifiedLine-Fill Buffer LogicIn conclusionn Fetch Q is the bottleneckn Hypothesis partially validn Better results for Spec2000 ?n Better combination of proposed weights ?n New weights ?n Same weights to work across multiple benchmarks ?Learning experiencen Difficult to increase IPC beyond what a base trace cache offers.n How to proceed with such research projectsn Why man-months are so important in architecture research


View Full Document
Download Towards a More Efficient Trace Cache
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Towards a More Efficient Trace Cache and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Towards a More Efficient Trace Cache 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?