CMU CS 15745 - Exploiting Data Reference Locality - D2345077

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15745> Exploiting Data Reference Locality

DOC PREVIEW

CMU CS 15745 - Exploiting Data Reference Locality

School name Carnegie Mellon University

Course Cs 15745- Optimizing Compilers for Modern Architectures

Pages 24

This preview shows page 1-2-23-24 out of 24 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Efficient Representations and Abstractions for Quantifying and Exploiting Data Reference LocalityTrishul Chilimbi15-745 Optimizing CompilersSpring 2006Sean McLaughlinMemory optimizations are important!OutlineBackgroundDefining localityMeasuring localityExploiting localityDefining locality (textbook)temporal locality - programs reference data items that were recently referenced themselvesspacial locality - programs reference data items that are close to recently referenced itemsNote: definitions give no metricimproving localityClustering: Put items frequently accessed together on the same page in memoryClustering II: Align items accessed together so they land in different cache linesPre-fetching: Load data from a lower memory layer to a higher if its use is expected in the near futureThe good newsControl flow graphs and program paths (Larus) capture dynamic control flow, allowing for good instruction cache behavior. Aggregate load/store analysis can yield decent page-level clustering.The bad newsCaches are too small for simple page clustering to be effective.Aggregate data access information is not sufficient for cache-level layout.Static analysis too complex on modern architectures, so use a traceAccess traces are too large to analyze quickly.Need for sequences, rather than individual accesses, prevent statistical sampling.Problem need data reference abstractions to identify and measure locality (analogous to hot program paths) need efficient data reference representation (analogous to Whole Program Paths)OutlineBackgroundDefining localityMeasuring localityExploiting localityDefining locality (informally)the most recently used data is likely to be accessed again in the near futureGood locality implies a large skew in the reference distribution.90/10 rule for dataLocality in terms of hottest load/store instructionsLocality in terms of data addressesDefining locality (formally)To be exploitable by cache opts, data references must exhibit reference localityexhibit regularityregular + ref locality = exploitable localityAbstraction: data streamsA data stream is a subsequence that exhibits regularityA hot data stream also covers a large amount of the data referencesWe formally define exploitable locality in terms of hot data streamsmeasuring localityWe want to measure locality, as it can identify opt targetsStandard “definitions” are vaguemeasuring localityInherent exploitable spatial locality = weighted average of spatial regularity across hot data streams (weight=magnitude)Inherent exploitable temporal locality = average HDS temporal regularity Realized exploitable locality = cache block packing efficiency = min/actual cache blocks needed to store streamExploiting localityHot data streams + locality metric = improved data reference localityidentify suboptimal programs focus opts on particular streamsidentify salient optimizationsExploiting localitymeasures can be used to determine what combination of clustering and prefetching will be most effective, eg. hot streams with poor temporal locality are served by prefetching (not clustering)streams with poor packing efficiency can be helped by clusteringResultsQuestions How is it possible that optimizing a program with such a fine grain detail helps other runs?The *measurement* of locality is wrt a particular trace of a program, even for inherent locality. Can this be made more general?QuestionsProblems with the scheme?Runtime improvements?How do these memory opts interact with the scalar opts of an aggressive compiler?What about programs with sensitive input behavior? (cf. generational GC, which often behaves well, but also works terribly in some instances)The

View Full Document