DOC PREVIEW
CMU CS 15745 - Exploiting Data Reference Locality

This preview shows page 1-2-23-24 out of 24 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Efficient Representations and Abstractions for Quantifying and Exploiting Data Reference LocalityTrishul Chilimbi15-745 Optimizing CompilersSpring 2006Sean McLaughlinMemory optimizations are important!OutlineBackgroundDefining localityMeasuring localityExploiting localityDefining locality (textbook)temporal locality - programs reference data items that were recently referenced themselvesspacial locality - programs reference data items that are close to recently referenced itemsNote: definitions give no metricimproving localityClustering: Put items frequently accessed together on the same page in memoryClustering II: Align items accessed together so they land in different cache linesPre-fetching: Load data from a lower memory layer to a higher if its use is expected in the near futureThe good newsControl flow graphs and program paths (Larus) capture dynamic control flow, allowing for good instruction cache behavior. Aggregate load/store analysis can yield decent page-level clustering.The bad newsCaches are too small for simple page clustering to be effective.Aggregate data access information is not sufficient for cache-level layout.Static analysis too complex on modern architectures, so use a traceAccess traces are too large to analyze quickly.Need for sequences, rather than individual accesses, prevent statistical sampling.Problem need data reference abstractions to identify and measure locality (analogous to hot program paths) need efficient data reference representation (analogous to Whole Program Paths)OutlineBackgroundDefining localityMeasuring localityExploiting localityDefining locality (informally)the most recently used data is likely to be accessed again in the near futureGood locality implies a large skew in the reference distribution.90/10 rule for dataLocality in terms of hottest load/store instructionsLocality in terms of data addressesDefining locality (formally)To be exploitable by cache opts, data references must exhibit reference localityexhibit regularityregular + ref locality = exploitable localityAbstraction: data streamsA data stream is a subsequence that exhibits regularityA hot data stream also covers a large amount of the data referencesWe formally define exploitable locality in terms of hot data streamsmeasuring localityWe want to measure locality, as it can identify opt targetsStandard “definitions” are vaguemeasuring localityInherent exploitable spatial locality = weighted average of spatial regularity across hot data streams (weight=magnitude)Inherent exploitable temporal locality = average HDS temporal regularity Realized exploitable locality = cache block packing efficiency = min/actual cache blocks needed to store streamExploiting localityHot data streams + locality metric = improved data reference localityidentify suboptimal programs focus opts on particular streamsidentify salient optimizationsExploiting localitymeasures can be used to determine what combination of clustering and prefetching will be most effective, eg. hot streams with poor temporal locality are served by prefetching (not clustering)streams with poor packing efficiency can be helped by clusteringResultsQuestions How is it possible that optimizing a program with such a fine grain detail helps other runs?The *measurement* of locality is wrt a particular trace of a program, even for inherent locality. Can this be made more general?QuestionsProblems with the scheme?Runtime improvements?How do these memory opts interact with the scalar opts of an aggressive compiler?What about programs with sensitive input behavior? (cf. generational GC, which often behaves well, but also works terribly in some instances)The


View Full Document

CMU CS 15745 - Exploiting Data Reference Locality

Documents in this Course
Lecture

Lecture

14 pages

Lecture

Lecture

19 pages

Lecture

Lecture

8 pages

Lecture

Lecture

5 pages

Lecture

Lecture

6 pages

lecture

lecture

17 pages

Lecture 3

Lecture 3

12 pages

Lecture

Lecture

17 pages

Lecture

Lecture

18 pages

lecture

lecture

14 pages

lecture

lecture

8 pages

lecture

lecture

5 pages

Lecture

Lecture

19 pages

lecture

lecture

10 pages

Lecture

Lecture

20 pages

Lecture

Lecture

8 pages

Lecture

Lecture

7 pages

lecture

lecture

59 pages

Lecture

Lecture

10 pages

Task 2

Task 2

2 pages

Handout

Handout

18 pages

Load more
Download Exploiting Data Reference Locality
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Exploiting Data Reference Locality and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Exploiting Data Reference Locality 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?