DOC PREVIEW
CMU CS 15740 - Dead-Block Prediction & Dead-Block Correlating Prefetcher

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Dead-Block Prediction & Dead-Block Correlating Prefetchers An-Chow Lai Electrical & Computer Engineering Purdue University West Lafayette, IN 47907 laia @ ecn.purdue, edu Abstract Cem Fide Sun Microsystems 901 San Antonio Rd Palo Alto, CA 94303 cem.fide @ eng.sun, com Effective data prefetching requires accurate mechanisms to predict both "which" cache blocks to prefetch and "when" to prefetch them. This paper proposes the Dead- Block Predictors (DBPs), trace-based predictors that accu- rately identify "when" an L1 data cache block becomes evictable or "dead". Predicting a dead block significantly enhances prefetching lookahead and opportunity, and enables placing data directly into L1, obviating the need for auxiliary prefetch buffers. This paper also proposes Dead-Block Correlating Prefetchers (DBCPs), that use address correlation to predict "which" subsequent block to prefetch when a block becomes evictable. A DBCP enables effective data prefetching in a wide spectrum of pointer- intensive, integer, and floating-point applications. We use cycle-accurate simulation of an out-of-order superscalar processor and memory-intensive benchmarks to show that: (1) dead-block prediction enhances prefetch- ing lookahead at least by an order of magnitude as com- pared to previous techniques, (2) a DBP can predict dead blocks on average with a coverage of 90% only mispredict- ing 4% of the time, (3) a DBCP offers an address prediction coverage of 86% only mispredicting 3% of the time, and (4) DBCPs improve performance by 62% on average and 282% at best in the benchmarks we studied. 1 Introduction Increasing processor clock speeds along with microar- chitectural innovation have led to a tremendous gap between processor and memory performance. Architects have primarily relied on deeper cache hierarchies, where each level trades off faster lookup speed for larger capacity, to reduce this performance gap. Conventional cache hierar- chies employ a demand-fetch memory access model, in which data are fetched into higher levels upon processor requests. Unfortunately, the limited capacity in higher cache levels and the simple data placement mechanisms used in conventional hierarchies often result in high miss rates and reduce performance. While superscalar engines Babak Falsafi Electrical & Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213 [email protected] http://www.ece.cmu.edu/-impetus with non-blocking caches [19] allow overlapping the miss latency among the higher cache levels, limited available instruction-level parallelism and long access latencies to lower cache levels often expose the miss latency in many important classes of applications. Many architects have additionally relied on the prefetch memory access model to mitigate the shortcomings of the demand-fetch model. Prefetching helps fetch data in advance to hide the memory latency by predicting future memory requests. While prefetching can be initiated in either hardware [17,5,10,3,15,4,6] or software [9,8,14,12], many researchers and vendors opt for hardware implemen- tations for transparency and due to availability of runtime information which can significantly improve prefetching's effectiveness. Most previous proposals for hardware prefetchers target specific memory access patterns -- such as strided accesses [15,4,6] and accesses to linked data structures [17]. While effective for the targeted access pat- terns, these prefetchers have limited general applicability across a wide spectrum of applications. There are a number of prefetcher proposals in the litera- ture that target generalized memory access patterns [5,3] -- including strided accesses, and indirect accesses to linked data structures and arrays. These proposals primarily rely on miss address correlation [1] as a technique to predict and prefetch memory addresses. These prefetchers, which we refer to as Miss Correlating Prefetchers (MCPs), record a history of prior L1 cache miss addresses, and correlate the history to a subsequent miss to trigger a prefetch. Unfortunately, MCPs suffer from several key shortcom- ings. First, LI cache misses are often clustered, especially in out-of-order engines with high-bandwidth L I caches, significantly limiting the lookahead and opportunity for timely prefetching. Second, rather than predicting block evictability, these prefetchers place the (prefetched) data in small associative buffers, and look them up either in paral- lel with L1 thereby increasing Ll's critical access path or upon an L1 miss thereby increasing the prefetch hit latency. Finally, miss address correlation has not been shown to offer both high prediction accuracy (i.e., correct predictions as a fraction of all predictions) and high coverage (i.e., cor- 144 1063-6897/01 $10.00 © 2001 IEEErect predictions as a fraction of all misses) [5]. This paper proposes the Dead-Block Predictors (DBPs) and the Dead-Block Correlating Prefetchers (DBCPs). A DBP is a novel hardware mechanism that predicts "when" a block in a data cache becomes evictable. In a recent paper [7], we proposed trace-based predictors that record a trace of shared memory references to predict a last reference to a cache block prior to an invalidation in a multiprocessor. Similarly, a DBP records a trace of memory references that accurately predict the lastreference to a block in an L1 data cache, prior to the block's eviction. A DBCP uses address correlation in conjunction with dead-block traces to predict a subsequent address upon a dead-block prediction. Accu- rate predicton of a block's evictability enables timely prefetching of data directly into an L1 data cache. We use a cycle-accurate simulation of an aggressive out- of-order superscalar processor and a spectrum of memory- intensive benchmarks to show the following: • For critical cache misses (that are not fully overlapped by computation and incur stalls), on average 92% of the intervals between a last reference to a block until its eviction from L1 are larger than L2 latency, indicating excellent lookahead opportunity for DBCP. In contrast, on average only 38% of the intervals between two sub-


View Full Document

CMU CS 15740 - Dead-Block Prediction & Dead-Block Correlating Prefetcher

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Dead-Block Prediction & Dead-Block Correlating Prefetcher
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Dead-Block Prediction & Dead-Block Correlating Prefetcher and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Dead-Block Prediction & Dead-Block Correlating Prefetcher 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?