Trace CacheLeon GuDipti Motiani15-740: Computer Architecture, Fall, 200310/01/2003Papersn Eric Rotenberg, Steve Bennett, and James E. Smith. A Trace Cache Microarchitecture and Evaluation, in IEEE Transactions on Computers, 48(2):111-120, February 1999. n Bryan Black, Bohuslav Rychlik, and John Paul Shen. The Block-based Trace Cache, in Proceedings of the 26th Annual International Symposium on Computer Architecture, pages 196-207, May 1999. n Michael Sung. Design of Trace Cache for High Bandwidth Instruction Fetching. Masters Thesis, May 1998n Eric Rotenberg, Steve Bennett, Jim Smith. Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching. April, 1996Superscalar Processorsn Producer - Consumern Instruction Level ParallelismMotivationn Exploit ILPn Fetch Bottleneck¡ Instruction cache misses¡ Branch prediction accuracy¡ Branch prediction throughput¡ Noncontiguous instruction fetching¡ Fetch unit latencyTrace Cachen A trace is a sequence of instructions starting at any point in a dynamic instruction stream.n It is specified by a start address and the branch outcomes of control transfer instructions.Fetch Mechanismn Trace cache is accessed in parallel with instruction cache.¡ Hitè Trace read into issue buffer¡ Missè Fetch from instruction cachen Trace cache hit if¡ Fetch address match¡ Branch predictions matchn Trace cache is NOT on the critical path of instruction fetch.Fetch MechanismDesign Issuesn Trace Lengthn Sizingn Indexingn Branch Throughputn Fill Mechanismn Partial Matchesn Associativityn Replacement PolicyPaper 1n Present a micro-architecture incorporating a trace cache¡ Control flow prediction and instruction supply at trace leveln Evaluate performance advantagen Design issues – size and associativityMicroarchitectureMicroarchitecturen Trace-level sequencingn Instruction-level sequencingn Next trace predictionn Trace selectionn Hierarchical sequencingPerformance of Fetch ModelsFill MechanismPaper 1: Critiquen Power consumptionn Filling unit latencyn Duplication of instructionsn Liveness of tracesn Design issuesPaper 2n Present a block-based trace cache implementation¡ Fetch address renaming¡ Basic block cachen Performance comparison between conventional and block-based trace cacheMotivationn Trace cache storage efficiency. n Reduce the latency of indexing and associativity.n Flexibility of trace construction and prediction.ComparisonConventional Block-basedQuestionsn Dependence on branch prediction n Other mechanisms¡ Branch Address Cache¡ Collapsing Buffern Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetchingn Compiler Techniques ?Discussion - Researchn Replace instruction cache with trace cache?n Reduce duplication and fragmentationn Dynamic direction prediction trace cache¡ Using Dynamic Branch Behavior for Power-Efficient Instruction Fetch. J. S. Hu, N. Vijaykrishnan, M. J. Irwin, M. Kandemirn Pentium4: Execution trace cache stores 12K decoded
View Full Document