DOC PREVIEW
CMU CS 15740 - Locality vs. Criticality

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Locality vs. Criticality Srikanth T. Srinivasan + Roy Dz-ching Ju ~ Alvin R. Lebeck + Chris Wilkerson$ +Department of Computer Science *Microprocessor Research Labs Duke University Intel Corporation {sri, alvy} @cs.duke.edu {roy.ju, chris.wilkerson} @intel.com Abstract Current memory hierarchies exploit locality of refer- ences to reduce load latency and thereby improve proces- sor performance. Locality based schemes aim at reducing the number of cache misses and tend to ignore the nature of misses. This leads to a potential mis-match between load latency requirements and latencies realized using a tradi- tional memory system. To bridge this gap, we partition loads as critical and non-critical. A load that needs to com- plete early to prevent processor stalls is classified as criti- cal, while a load that can tolerate a long latency is consid- ered non-critical. In this paper, we investigate if it is worth violating local- ity to exploit information on criticality to improve processor performance. We present a dynamic critical load classifica- tion scheme and show that 40% performance improvements are possible on average, if all critical loads are guaranteed to hit in the LI cache. We then compare the two proper- ties, locality and criticality, in the context of several cache organization and prefetching schemes. We find that the working set of critical loads is large, and hence practical cache organization schemes based on criticality are unable to reduce the critical load miss ratios enough to produce performance gains. Although criticality-based prefetching can help for some resource constrained programs, its ben- efit over locality-based prefetching is small and may not be worth the added complexity. 1 Introduction The wide-spread use of cache memory hierarchies [17] is a testament to their success as a cost-effective solution to help alleviate the growing disparity between processor cy- cle time and DRAM access time. By using a small, fast, but expensive SRAM cache to satisfy most accesses, cache hier- archies can provide access times close to processor speeds, while cheaper, slower DRAM provides large capacity to avoid I/O delays. Increases in transistor budgets enable most systems today to employ two or three levels in a cache hierarchy, with increasing capacity and access time as you move from the processor to main memory. Caches work by exploiting spatial and temporal locality in a program's access pattern. Spatial locality is exploited by fetching a region of memory---called a cache block-- rather than just the accessed data. Caches exploit tem- poral locality by retaining recently accessed cache blocks. Most cache management schemes try to exploit locality to increase the fraction of memory accesses satisfied by the cache (i.e., cache hit ratio). Although increasing the overall number of cache hits is usually desirable, recent research [19] shows that not all memory accesses are equal. In dynamically scheduled pro- cessors, the latency of some memory load operations can have a much larger influence on overall performance than other loads. Therefore, it may be possible to improve over- all performance by decreasing the latency of these critical loads at the expense of increased latency for non-critical loads. The goal of this paper is to determine if criticality is a strong enough program property to warrant a change in memory hierarchy management techniques for practical implementations. Specifically, we investigate if practical criticality-based approaches can equal or surpass the perfor- mance of existing locality-based techniques. The previous analysis [19] relied on a sophisticated simulator with roll- back to determine load criticality. A practical implementa- tion requires hardware support for on-line computation of load criticality. This must be augmented by realistic load latency reduction techniques that can exploit this informa- tion. In this paper, we propose a hardware implementation to determine load criticality and validate that there is poten- tial to improve performance over a traditional cache hier- archy. We then compare locality-based cache organization and prefetching techniques with corresponding criticality- based schemes. Our criticality based cache organization scheme uses a critical cache, that is functionally similar to a conventional victim cache [ 10], but holds only blocks that were touched by a critical load. We also examine multi-line prefetching [9, 12] based on both locality and criticality. We use Simplescalar [ 1] to evaluate the above techniques for a set of SPEC 2000 integer and Olden [ 16] benchmarks. 1063-6897/01 $10.00 © 2001 IEEE 132The primary result from our simulations is that load criti- cality is not a sufficiently strong property to warrant cache management changes. Although, criticality can be used as a filter for making prefetching decisions, it does not appear to be worth the added complexity. In particular, our results reveal the following: There is potential to exploit criticality. If all the hard- ware identified critical loads could be satisfied by the L1 cache, it would produce an average 40% improve- ment over a traditional memory hierarchy. This com- pares to an average 12% improvement if an equal per- centage of loads are randomly chosen to hit in tile L1 cache. Managing cache content based on locality outperforms criticality-based techniques. The working set of criti- cal loads is large and because of spatial locality be- tween non-critical and critical loads, a locality-based cache provides lower critical load miss ratios than a criticality-based cache. Criticality-based prefetching into the L2 cache achieves an average speedup of 4%, compared to 2% for locality-based prefetching, across several bench- marks with resource constraint problems. However, if resource constraints are not a problem, the most ag- gressive locality-based prefetching scheme does best. The remainder of this paper is organized as follows. Section 2 provides background on load criticality, presents our hardware technique for determining load criticality and evaluates the potential for improved performance. We eval- uate cache organization schemes in Section 3 and Section 4 investigates prefetching.


View Full Document

CMU CS 15740 - Locality vs. Criticality

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Locality vs. Criticality
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Locality vs. Criticality and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Locality vs. Criticality 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?