New version page

CMU CS 15745 - Learning A Better Compiler

Documents in this Course
Lecture

Lecture

14 pages

Lecture

Lecture

19 pages

Lecture

Lecture

8 pages

Lecture

Lecture

5 pages

Lecture

Lecture

6 pages

lecture

lecture

17 pages

Lecture 3

Lecture 3

12 pages

Lecture

Lecture

17 pages

Lecture

Lecture

18 pages

lecture

lecture

14 pages

lecture

lecture

8 pages

lecture

lecture

5 pages

Lecture

Lecture

19 pages

lecture

lecture

10 pages

Lecture

Lecture

20 pages

Lecture

Lecture

8 pages

Lecture

Lecture

7 pages

lecture

lecture

59 pages

Lecture

Lecture

10 pages

Task 2

Task 2

2 pages

Handout

Handout

18 pages

Load more
Upgrade to remove ads

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

Learning A Better CompilerPredicting Unroll FactorsLoop UnrollingUnroll FactorsOptimal Unroll FactorsClassification ProblemsNearest NeighborsSlide 8Support Vector MachinePowerPoint PresentationMaximal MarginNon-Linear SVMSome FeaturesResults: No Software ParallelismResults: With Software ParallelismBig Idea: Easy MaintenanceIntegrated CPU and L2 Cache Voltage Scaling using Machine LearningDynamic Voltage ControlMultiple Clock DomainsMotivationLearning a DVM PolicyML ParametersMachine Learning AlgorithmSlide 24ResultsConclusionLearning A Better CompilerPredicting Unroll Factors using Supervised ClassificationAndIntegrating CPU and L2 Cache Voltage Scaling using Machine LearningPredicting Unroll Factors•Loop Unrolling sensitive to unroll factor•Current solution: expert design–Difficult: Hand-tuned heuristics–Must be rewritten frequently•Predict parameters with machine learning–Easy: data collection takes ~1wk•No human time–Algorithm does not change with compilerLoop Unrolling•Combines multiple iterations loop body•Fewer Iterations  Less Branching•Allows other transformations:–Exposes adjacent memory locations–Allows instruction reordering across iterationsUnroll Factors•How many iterations to combine?•Too few?–Provides little benefit•Too large–Increased cache pressure–Increase live rangeregister pressureQuickTime™ and a decompressorare needed to see this picture.Optimal Unroll FactorsClassification Problems•Input a vector of features–E.g. nest depth, # of branches, # of ops•Output a class–E.g. unroll factor, 1-8•No prior knowledge required–Meaning of features/classes–Relevance of features–Relationships between featuresNearest Neighbors•Paper describes Kernel Density Estimator•All dimensions normalized to [0,1]•Given a test point p :–Consider training points “close” to p•Within fixed distance, e.g. 0.3–Majority vote among qualifying training pointsNearest NeighborsQuickTime™ and a decompressorare needed to see this picture.Support Vector Machine•Assume two classes, easily generalized•Transform data–Make classes linearly separable•Find line to maximize sep. margin•For test point:–Perform transformation–Classify based on learned lineMaximal MarginQuickTime™ and a decompressorare needed to see this picture.Non-Linear SVMQuickTime™ and a decompressorare needed to see this picture.Some Features•# operands•Live range size•Critical path length•# operations•Known tripcount•# floating point ops•Loop nest level•# branches•# memory ops•Instruction fan-in in DAG•# instructions•Language: C, fortran•# memory ops•# Implicit instructions•& more (38 total)Results: No Software ParallelismQuickTime™ and a decompressorare needed to see this picture.Results: With Software ParallelismQuickTime™ and a decompressorare needed to see this picture.Big Idea: Easy Maintenance•Performance improvements modest–Sometimes worse, sometimes much better–Usually little change•Requires no re-tuning to change compiler–Gathering data takes ~1wk, no human time•General mechanism–Can be applied to all parameters–No model of system needed•Can be applied to new transformations where expert knowledge is unavailableIntegrated CPU and L2 Cache Voltage Scaling using Machine LearningDynamic Voltage Control•Monitor system•When activity is low, reduce power–Also reduces computational capacity–May need more energy if work takes longerMultiple Clock Domains•Adjust separate components independently•Better performance/power–E.g. CPU-bound application may be able to decrease power to memory and cache without affecting performance•More complex DVM policyMotivation•Applications go through phases•Frequency/voltages should change too•Focus on core, L2 cache– Consume large fraction of total power•Best policy may change over time–On battery: conserve power–Plugged in: maximize performanceLearning a DVM Policy•Compiler automatically instruments code–Insert sampling code to record perf. Counters–Instrument code only to gather data•Use machine learning to create policy•Implement policy in microcontrollerML Parameters•Features–Clock cycles per instruction–L2 accesses per instruction–Memory access per instruction•Select voltage to minimize:–Total energy–Energy*delayMachine Learning Algorithm•Automatically learn set of if-then rules–E.g: If (L2PI >= 1) and (CPI <=0) then f_cache=1GHz•Compact, expressive•Can be implemented in hardwareQuickTime™ and a decompressorare needed to see this picture.Results•Compared to independently managing core and L2:–Saves 22% on average, 46% max•Learns effective rules from few features•Compiler modifications instrument code•Learned policy offline•Implemented policy in microcontrollerConclusion•Machine learning derives models from data automatically•Allows easy maintenance of heuristics•Creates models that are more effective than


View Full Document
Download Learning A Better Compiler
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning A Better Compiler and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning A Better Compiler 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?