DOC PREVIEW
CMU CS 15745 - Cache-Conscious Structure Layout

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Cache-Conscious Structure Layout Trishul M. Chilimbi Mark D. Hill Computer Sciences Department Computer Sciences Department University of Wisconsin University of Wisconsin 1210 West Dayton St. 1210 West Dayton St. Madison, WI 53706 Madison, WI 53706 [email protected] [email protected] James R. Larus Microsoft Research One Microsoft Way Redmond, WA 98052 [email protected] ABSTRACT Hardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of tech- niques for tolerating or reducing memory latency have been pro- posed, these are rarely successful for pointer-manipulating programs. This paper explores a complementary approach that attacks the source (poor reference locality) of the problem rather than its mani- festation (memory latency). It demonstrates that careful data orga- nization and layout provides an essential mechanism to improve the cache locality of pointer-manipulating programs and consequently, their performance. It explores two placement technique-luster- ing and colorinet improve cache performance by increasing a pointer structure’s spatial and temporal locality, and by reducing cache-conflicts. To reduce the cost of applying these techniques, this paper dis- cusses two strategies-cache-conscious reorganization and cache- conscious allocation--and describes two semi-automatic tools- ccmorph and ccmalloc-that use these strategies to produce cache-conscious pointer structure layouts. ccmorph is a transpar- ent tree reorganizer that utilizes topology information to cluster and color the structure. ccmalloc is a cache-conscious heap allocator that attempts to co-locate contemporaneously accessed data ele- ments in the same physical cache block. Our evaluations, with microbenchmarks, several small benchmarks, and a couple of large real-world applications, demonstrate that the cache-conscious structure layouts produced by ccmorph and ccmalloc offer large performance benefit-n most cases, significantly outper- forming state-of-the-art prefetching. Keywords Cache-conscious data placement, clustering, coloring, cache-con- scious allocation, cache-conscious reorganization 1. INTRODUCTION The speed of microprocessors has increased 60% per year for almost two decades. Yet, over the same period, the time to access main memory only decreased at 10% per year [32]. The unfortu- nate, but inevitable, consequence of these trends is a large, and ever-increasing, processor-memory gap. Until recently, memory caches have been the ubiquitous response to this problem [SO, 431. In the beginning, a single cache sufficed, but the increasing gap Permission to make digital or hard copies of all or part of this work for personal or classroom usa is granted without fee provided that copies are not made or distributed for profit or commercial advan- tage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on sarvers or to redistribute to lists, requires prior specific permission and/or a fee. SIGPLAN ‘99 IPLDI) 5/99 Atlanta, GA, USA 0 1999 ACM l-58113-083.X/99/0004...85.00 (now almost two orders of magnitude) requires a hierarchy of caches, which introduces further disparities in memory-access costs. Many hardware and software technique-uch as prefetching [29, 9, 26, 381, multithreading [25, 441, non-blocking caches [20], dynamic instruction scheduling, and speculative execution+ry to reduce or tolerate memory latency. Even so, many programs’ per- formance is dominated by memory references. Moreover, high and variable memory access costs undercut the fundamental random- access memory (RAM) model that most programmers use to under- stand and design data structures and algorithms. Over the same period, application workloads have also changed. Predominately scientific applications have broadened into a richer workload. With this shift came a change in data structure, from arrays to a richer mix of pointer-based structures. Not surprisingly, techniques for reducing or tolerating memory latency in scientific applications are often ineffective for pointer manipulating programs [7, 331. In addition, many techniques are fundamentally limited by their focus on the manifestation of the problem (memory latency), rather than its cause (poor reference locality). In general, software reference locality can be improved either by changing a program’s data access pattern or its data organization and layout. The first approach has been successfully applied to improve the cache locality of scientific programs that manipulate dense matrices [S2, 10, 161. Two properties of array structures are essential to this work: uniform, random access to elements, and a number theoretic basis for statically analyzing data dependencies. These properties allow compilers to analyze array accesses com- pletely and to reorder them in a way that increases cache locality (loop transformations) without affecting a program’s result. Unfortunately, pointer structures share neither property. However, they possess another, extremely powerful property of locational transparency: elements in a structure can be placed at different memory (and cache) locations without changing a program’s semantics. Careti placement of structure elements provides the mechanism to improve the cache locality of pointer manipulating programs and, consequently, their performance. This paper describes and provides an analytic framework for two placement technique~lustering and colorin@t improve cache perfor- mance on uniprocessor systems by increasing a data structure’s spa- tial and temporal locality, and reducing cache conflicts. Applying these techniques may require detailed knowledge of a program’s code and data structures, architectural familiarity, and considerable programmer effort. To reduce this cost, we discuss two strategies-cache-conscious reorganization, and cache-con- scious allocation-for applying these placement techniques to pro- duce cache-conscious pointer structure layouts, and describe two semi-automatic tools-ccmorph and ccmalloc-that embody these strategies. Measurements demonstrate that cache-conscious data layouts produced by ccmorph and ccmalloc offer large performance benefits-in most cases,


View Full Document

CMU CS 15745 - Cache-Conscious Structure Layout

Documents in this Course
Lecture

Lecture

14 pages

Lecture

Lecture

19 pages

Lecture

Lecture

8 pages

Lecture

Lecture

5 pages

Lecture

Lecture

6 pages

lecture

lecture

17 pages

Lecture 3

Lecture 3

12 pages

Lecture

Lecture

17 pages

Lecture

Lecture

18 pages

lecture

lecture

14 pages

lecture

lecture

8 pages

lecture

lecture

5 pages

Lecture

Lecture

19 pages

lecture

lecture

10 pages

Lecture

Lecture

20 pages

Lecture

Lecture

8 pages

Lecture

Lecture

7 pages

lecture

lecture

59 pages

Lecture

Lecture

10 pages

Task 2

Task 2

2 pages

Handout

Handout

18 pages

Load more
Download Cache-Conscious Structure Layout
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Cache-Conscious Structure Layout and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Cache-Conscious Structure Layout 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?