New version page

CMU CS 15745 - Load-Reuse Analysis: Design and Evaluation

Documents in this Course
Lecture

Lecture

14 pages

Lecture

Lecture

19 pages

Lecture

Lecture

8 pages

Lecture

Lecture

5 pages

Lecture

Lecture

6 pages

lecture

lecture

17 pages

Lecture 3

Lecture 3

12 pages

Lecture

Lecture

17 pages

Lecture

Lecture

18 pages

lecture

lecture

14 pages

lecture

lecture

8 pages

lecture

lecture

5 pages

Lecture

Lecture

19 pages

lecture

lecture

10 pages

Lecture

Lecture

20 pages

Lecture

Lecture

8 pages

Lecture

Lecture

7 pages

lecture

lecture

59 pages

Lecture

Lecture

10 pages

Task 2

Task 2

2 pages

Handout

Handout

18 pages

Load more
Upgrade to remove ads

This preview shows page 1-2-3-4 out of 13 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

Load-Reuse Analysis: Design and Evaluation Rastislav Bodik Rajiv Gupta Mary Lou Soffa Dept. of Computer Science University of Pittsburgh Pittsburgh, PA 15260 {bodik, gupta, sof fa}@cs .pitt . edu Abstract Load-reuse analysis finds instructions that repeatedly access the same memory location. This location can be promoted to a register, eliminating redundant loads by reusing the results of prior memory accesses. This paper develops a load-reuse analysis and designs a method for evaluating its precision. In designing the analysis, we aspire for completeness-the goal of exposing all reuse that can be harvested by a subsequent program transformation. For register promotion, a suitable transformation is partial redundancy elimination (PRE). To approach the ideal goal of PRE-completeness, the load-reuse analysis is phrased as a data- flow problem on a program representation that is path-sensitive, as it detects reuse even when it originates in a different instruction along each control flow path. Furthermore, the analysis is compre- hensive, as it treats scalar, array and pointer-based loads uniformly. In evaluating the analysis, we compare it with an ideal analysis. By observing the run-time stream of memory references, we col- lect all PRE-exploitable reuse and treat it as the ideal analysis per- formance. To compare the (static) load-reuse analysis with the (dynamic) ideal reuse, we use an estimator algorithm that com- putes, given a dam-flow solution and a program profile, the dy- namic amount of reuse detected by the analysis. We developed a family of estimators that differ in how well they bound the profil- ing error inherent in the edge profile. By bounding the error, the estimators offer a precise and practical method for determining the run-time optimization benefit. Our experiments show that about 55% of loads executed in Spec95 exhibit reuse. Of those, our analysis exposes about 80%. Keywords: profile-guided optimizations, register promotion, pro- gram representations, data-flow analysis. 1 Introduction Without comparison, caches are the best hardware defense against the von Neumann memory bottleneck. Capitalizing on data local- ity, caches win by reusing recent memory accesses. How can com- pilers benefit from these reuse opportunities? In the ideal case, the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies am not made or distributed for profit or commercial advan- tage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servars or to redistribute to lists, requires prior specific permission and/or e fee. SlGPLAN ‘99 IPLDII 5199 Atlanta, GA, USA (9 1999 ACM l-581 13s083-X/99/0004...$5.00 compiler promotes repeatedly accessed memory locations to reg- isters. Register promotion is the best compiler solution for reduc- ing the memory traffic. By removing redundant loads, it decreases the dynamic operation count and shortens instruction schedules. This paper focuses on compile-time detection of load reuse that is amenable to register promotion. We measure the amount of load reuse in programs, and design and evaluate an analysis for reuse detection. Register promotion entails three subproblems. First, load-reuse nnalysis finds loads and stores that access the same address, to- gether with the execution paths along which the reuse exists. In the example below, if ai always equals a4 along path pi , then load a4 can benefit from reuse along ~1. Similarly for path ~2. Second, alias analysis verifies that the detected reuse is not disrupted by intervening stores. Below, if as is never equal to a4, then register promotion of a4 is safe. Finally, a program transformation stores the prior memory access in a register and replaces the redundant load with a register reference. In the example, register promotion is not immediately applicable because load a4 is not redundant on all paths. Suchpartial reuse can be compensated by hoisting a copy of the load along path pa. Commonly, the hoisting is formulated as partial redundancy elimination (PRE) [26,28,35]. @Load-reuse analysis: is al = a4 along path pl? is a2 = a4 along path ~27 @Alias analysis: is a0 I= a4 7 path pl path p2 path p3 @Program transformation: hoist load a4 along path p3 Detecting reuse is profitable even when register promotion is pre- vented (due to aliasing or lack of registers). In such a case, the PRE transformation step can employ alternative, albeit less effective, reuse mechanisms. When promotion is unsafe due to interfering stores, the redundant load can be replaced with a data-speculative loud, which works as a register reference when the kill did not oc- cur, but as a load when it did [6,20,23,38]. When registers are not available, load reuse can be exploited using software cache con- trol [20,23,33]. By directing which loaded values remain in the cache and which bypass it, the compiler can improve the subopti- mal hardware cache replacement strategy. This paper focuses on the first component of register promotion, load-reuse analysis. Because an optimization is only as powerful as its analysis, improving the precision of the analysis is of high 64significance. The second component, alias analysis, has a different aim: while load-reuse analysis detects memory references that must go to the same location, alias analysis finds those that may, thus identifying killing stores. Recent research indicates that, for reg- ister promotion, a simple alias analysis may be sufficient [ 18,271. The third component, PRE transformation, was explored in [lo], where we describe how to effect a complete removal of all detected reuse. In this paper we concentrate on increasing the amount of detected reuse. Design. The design of the load-reuse analysis emphasizes scal- ability and completeness. Scalability is achieved by developing a sparse, SSA-based program representation, which grows moder- ately with the program size. The analysis is PRE-complete if it detects all reuse that the PRE transformation can exploit. Aiming for PRE-completeness is not a narrow goal, as PRE covers most scalar transformations based on data-flow analysis. It generalizes common-subexpression elim- ination, loop-invariant code motion, and


View Full Document
Download Load-Reuse Analysis: Design and Evaluation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Load-Reuse Analysis: Design and Evaluation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Load-Reuse Analysis: Design and Evaluation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?