CMU CS 15745 - Learning A Better Compiler - D1183003

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15745> Learning A Better Compiler

DOC PREVIEW

CMU CS 15745 - Learning A Better Compiler

School name Carnegie Mellon University

Course Cs 15745- Optimizing Compilers for Modern Architectures

Pages 26

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Learning A Better CompilerPredicting Unroll FactorsLoop UnrollingUnroll FactorsOptimal Unroll FactorsClassification ProblemsNearest NeighborsSlide 8Support Vector MachinePowerPoint PresentationMaximal MarginNon-Linear SVMSome FeaturesResults: No Software ParallelismResults: With Software ParallelismBig Idea: Easy MaintenanceIntegrated CPU and L2 Cache Voltage Scaling using Machine LearningDynamic Voltage ControlMultiple Clock DomainsMotivationLearning a DVM PolicyML ParametersMachine Learning AlgorithmSlide 24ResultsConclusionLearning A Better CompilerPredicting Unroll Factors using Supervised ClassificationAndIntegrating CPU and L2 Cache Voltage Scaling using Machine LearningPredicting Unroll Factors•Loop Unrolling sensitive to unroll factor•Current solution: expert design–Difficult: Hand-tuned heuristics–Must be rewritten frequently•Predict parameters with machine learning–Easy: data collection takes ~1wk•No human time–Algorithm does not change with compilerLoop Unrolling•Combines multiple iterations loop body•Fewer Iterations  Less Branching•Allows other transformations:–Exposes adjacent memory locations–Allows instruction reordering across iterationsUnroll Factors•How many iterations to combine?•Too few?–Provides little benefit•Too large–Increased cache pressure–Increase live rangeregister pressureQuickTime™ and a decompressorare needed to see this picture.Optimal Unroll FactorsClassification Problems•Input a vector of features–E.g. nest depth, # of branches, # of ops•Output a class–E.g. unroll factor, 1-8•No prior knowledge required–Meaning of features/classes–Relevance of features–Relationships between featuresNearest Neighbors•Paper describes Kernel Density Estimator•All dimensions normalized to [0,1]•Given a test point p :–Consider training points “close” to p•Within fixed distance, e.g. 0.3–Majority vote among qualifying training pointsNearest NeighborsQuickTime™ and a decompressorare needed to see this picture.Support Vector Machine•Assume two classes, easily generalized•Transform data–Make classes linearly separable•Find line to maximize sep. margin•For test point:–Perform transformation–Classify based on learned lineMaximal MarginQuickTime™ and a decompressorare needed to see this picture.Non-Linear SVMQuickTime™ and a decompressorare needed to see this picture.Some Features•# operands•Live range size•Critical path length•# operations•Known tripcount•# floating point ops•Loop nest level•# branches•# memory ops•Instruction fan-in in DAG•# instructions•Language: C, fortran•# memory ops•# Implicit instructions•& more (38 total)Results: No Software ParallelismQuickTime™ and a decompressorare needed to see this picture.Results: With Software ParallelismQuickTime™ and a decompressorare needed to see this picture.Big Idea: Easy Maintenance•Performance improvements modest–Sometimes worse, sometimes much better–Usually little change•Requires no re-tuning to change compiler–Gathering data takes ~1wk, no human time•General mechanism–Can be applied to all parameters–No model of system needed•Can be applied to new transformations where expert knowledge is unavailableIntegrated CPU and L2 Cache Voltage Scaling using Machine LearningDynamic Voltage Control•Monitor system•When activity is low, reduce power–Also reduces computational capacity–May need more energy if work takes longerMultiple Clock Domains•Adjust separate components independently•Better performance/power–E.g. CPU-bound application may be able to decrease power to memory and cache without affecting performance•More complex DVM policyMotivation•Applications go through phases•Frequency/voltages should change too•Focus on core, L2 cache– Consume large fraction of total power•Best policy may change over time–On battery: conserve power–Plugged in: maximize performanceLearning a DVM Policy•Compiler automatically instruments code–Insert sampling code to record perf. Counters–Instrument code only to gather data•Use machine learning to create policy•Implement policy in microcontrollerML Parameters•Features–Clock cycles per instruction–L2 accesses per instruction–Memory access per instruction•Select voltage to minimize:–Total energy–Energy*delayMachine Learning Algorithm•Automatically learn set of if-then rules–E.g: If (L2PI >= 1) and (CPI <=0) then f_cache=1GHz•Compact, expressive•Can be implemented in hardwareQuickTime™ and a decompressorare needed to see this picture.Results•Compared to independently managing core and L2:–Saves 22% on average, 46% max•Learns effective rules from few features•Compiler modifications instrument code•Learned policy offline•Implemented policy in microcontrollerConclusion•Machine learning derives models from data automatically•Allows easy maintenance of heuristics•Creates models that are more effective than

View Full Document