Optimizing General Compiler OptimizationProblem: Optimizing optimizations…but can we do better?Motivation for this paperBig IdeaFull vs. Fractional Factorial DesignOrthogonal ArraysOrthogonal ArraysAlgorithm – Step 1Step 1.1: Selecting the initial setAlgorithm – Step 1(cont.)Algorithm – Step 2Algorithm – Step 3Final ResultComparing with existing compiler switchesComparing resultsConclusionFuture workOptimizing General Compiler OptimizationM. Haneda, P.M.W. Knijnenburg,and H.A.G. WijshoffProblem: Optimizing optimizationsA compiler usually has many optimization settings (e.g. peephole, delayed-branch, etc)gcc 3.3 has 54 optimization optionsgcc 4 has over 100 possible settingsVery little is known about how these options affect each otherCompiler writers typically include switches that bundle together many optimization options gcc –O1, -O2, -O3…but can we do better?It is possible to perform better than these predefined optimization settings, but doing so requires extensive knowledge of the code as well as the available optimization options How do we define one set of options that would work well with a large variety of programs?Motivation for this paperSince there are too many optimization settings, an exhaustive search would cost too muchgcc 3: 2^50 different combinations!We want to define a systematic method to find the optimal settings, with a reduced search spaceIdeally, we would like to do this with minimal knowledge of what the options actually will doBig IdeaWe want to find the biggest subsets of compiler options that positively interact with each otherOnce we obtain these subsets, we will try to combine them together, under the condition that they do not negatively affect each otherWe will select our ultimate optimal compiler setting from the result of these set combinationsFull vs. Fractional Factorial DesignFull Factorial Design: explores the entire search space, with every possible combinationGiven k options, this will take O(2^k) time Fractional Factorial Design: explores a reduced search space, that is representative of the full search spaceThis can be done using orthogonal arraysOrthogonal ArraysAn Orthogonal Array is a matrix of 0’s and 1’s. The rows represent the experiments to be performed.The columns represent the factors that the experiment tries to analyzeAny option is equally likely to be turned on/off.Given a particular experiment with a particular option turned on, all the other options are still equally likely to be turned on/offOrthogonal ArraysAlgorithm – Step 1Finding maximum subsets of positively interacting optionsStep 1.1: Find a set of options that give the best overall improvementFor any single optimization setting i, compute the average speedup for all the settings in the search space in which i is turned onSelect M of the highest average improvement settingsStep 1.1: Selecting the initial setAlgorithm – Step 1(cont.)Step 1.2: Iteratively add new options to the already obtained sets, to get a maximum set of positively reinforcing optimizationsEx: If using options A and B together produces a more optimal setting than just using A, then add B If using {A, B} and C together produces a more optimal setting than {A, B}, then add C to {A, B}Algorithm – Step 2Take the sets that we already have and try to combine them together, assuming that they do not negatively influence each other. This is done to maximize the number of settings turned on for each setExample:If {A, B, C} and {D, E} do not counteract each other, then we can combine them into {A, B, C, D, E}Otherwise, leave them separateAlgorithm – Step 3Take the resulting sets from step 2, and select the one with the best overall improvement.The result would be the ideal combination of optimization settings, according to this methodology.Final ResultComparing with existing compiler switchesComparing resultsThe compiler setting obtained by this methodology outpeforms –O1, -O2, and –O3 on almost all the SPECint95 benchmarks-O3 performs better on li (39.2% vs. 38.4%)The new setting delivers the best performance for perl (18.4% vs. 10.5%)ConclusionThe paper introduced a systematic way of combining compiler optimization settingsUsed a reduced search space, constructed as an orthogonal array Can be done with no knowledge of actual optionsCan be done independently of architectureCan be applied to a wide variety of applicationsFuture workUsing the same methodology to find a good optimization setting for a particular domain of applicationsApplying the methodology to newer versions of the gcc compiler, such as gcc
View Full Document