This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Optimal Spilling for CISC Machines with Few Registers Andrew W. Appel Princeton University [email protected] Lal George Lucent Technologies Bell Laboratories george @ research, bell-labs.corn ABSTRACT Many graph-coloring register-allocation algorithms don't work well for machines with few registers. Heuristics for live-range split- ting are complex or suboptimal; heuristics for register assignment rarely factor the presence of fancy addressing modes; these prob- lems are more severe the fewer registers there are to work with. We show how to optimally split live ranges and optimally use address- ing modes, where the optimality condition measures dynamically weighted loads and stores but not register-register moves. Our al- gorithm uses integer linear programming but is much more efficient than previous ILP-based approaches to register allocation. We then show a variant of Park and Moon's optimistic coalescing algorithm that does a very good (though not provably optimal) job of remov- ing the register-register moves. The result is Pentium code that is 9.5% faster than code generated by SSA-based splitting with iter- ated register coalescing. 1. INTRODUCTION. Register allocation by graph coloring has been a big success for machines with 30 or more registers. The instruction selector gener- ates code using an unlimited supply of temporaries; liveness analy- sis constructs an interference graph with an edge between any two temporaries that are live at the same time (and thus cannot be al- located to the same register); a graph coloring algorithm finds a K-coloring of the interference graph (where K is the number of registers on the machine). If the graph is not K-colorable, then some nodes are spilled: the temporaries are implemented in mem- ory instead of registers, with a cost for loading them and storing them when necessary. Graph coloring is NP-complete, but simple algorithms can often do well. An important improvement to this algorithm was the idea that the live range of a temporary should be split into smaller pieces, with move instructions connecting the pieces. This relaxes the inter- ference constraints a bit, making the graph more likely to be K- colorable. The graph-coloring register allocator should coalesce two temporaries that are related by a move instruction if this can be done without increasing the number of spills. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advan- tage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PLDI 2001 6/01 Snowbird, Utah, USA © 2001 ACM ISBN 1-58113-414.2/01/06..,$5.00 Unfortunately, this approach has not worked well for machines like the Pentium, which have K = 6 allocable registers (there are 8 regis- ters but usually two are dedicated to specific purposes). What hap- pens is that there will typically be many nodes with degree much greater than K, and there is an enormous amount of spilling. Of course, with few registers there will inevitably be spilling, as the live variables cannot all be kept in registers; but if a variable is spilled because it has a long live range, then it stays spilled even (for example) in some loop where it is frequently used. On our test suite of 600 basic-block clusters comprising 163,355 instruc- tions, iterated register coalescing produces 84 spill instructions for a 32-register machine, but 22,123 spill instructions for an 8-register machine. This is about 14% of all instructions, which is worth the trouble to improve. In the last few years some researchers have taken a completely dif- ferent approach to register allocation: formulate the problem as an integer linear program (ILP) and solve it exactly with a general- purpose ILP solver. ILP is NP-complete, but approaches that com- bine the simplex algorithm with branch-and-bound can be success- ful on some problems. Unfortunately, the work to date in optimal register allocation via ILP has not quite been practical: Goodwin's optimal register allocator can take hundreds of seconds to solve for a large procedure [11, 12]. Goodwin has formulated "near-optimal register allocation (NORA)" as an ILP; our solution can be viewed as a different approach to near-optimal register allocation. A two-phase approach. Our new approach decomposes the regis- ter allocation problem into two parts: spilling, then register assign- ment. Instead of asking, "at program point p, should variable v be in register r?" we first ask, "at program point p, should variable v be in a register or in memory?" Clearly, this is a simpler ques- tion, and in fact we can formulate an integer linear program (ILP) that solves it optimally and efficiently (tens of milliseconds). This phase of register allocation finds the optimal set of splits and spills. Not only does our algorithm compute where to insert loads and stores to implement spills, but it also optimally selects addressing modes for CISC instructions that can get operands directly from memory. For example, the add instruction on the Pentium takes two operands s and d, and computes d +- d + s. The operands can be in registers or in memory, but they cannot both be in memory. On a modern implementation of the instruction set, the instruction mix] ~-- mix] +s is no faster than the sequence of instructions r +- m[x]; r ~ r+x; mix] +- r. However, the latter sequence requires an explicit temporary r, and if there are many other live values at this point, some other value will have to be spilled; the former sequence wouldn't require the spilling of some other value. Therefore, it is 243important to make use of the CISC instructions. The second phase is to allocate the unspilled variables to registers in a way that leaves as few as possible register-register moves in the program. This is difficult to do optimally, but we will show an efficient algorithm can get very good results. In judging our decomposition into two phases, there are three im- portant questions to ask: 1. When we decompose the problem into two subproblems (spilling and coloring) and solve each subproblem optimally, does that lead to an


View Full Document

UCLA COMSCI 239 - Optimal Spilling

Download Optimal Spilling
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Optimal Spilling and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Optimal Spilling 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?