DOC PREVIEW
CMU CS 15745 - Optimizing ML with Run-Time Code Generation

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Optimizing ML with Run-Time Code Generation*AbstractPeter LeeMark LeoneSchool of Computer ScienceCarnegie Mellon UniversityPittsburgh, Pennsylvania 15213-3891petel@cs. emu. edumleone~cs. cmu. eduWe describe the design and implementation of a compilerthat automatically translates ordinary programs written ina subset of ML into code that generates native code at runtime. Run-time code generation canmake useof values andinvariants that cannot be exploited at compile time, yieldingcode that is often superior to statically optimal code. Butthe cost of optimizing and generating code at run time canbe prohibitive. Redemonstrate howcompile-time special-ization can reduce the cost of run-time code generation byan order of magnitude without greatly tiecting code quality.Several benchmark programs are examined, which exhibit anaverage cost of only six cycles per instruction generated atrun time.1 IntroductionIn this paper, we describe our experience with a prototypesystem for run-time code generation. Our system, calledFABIUS, is a compiler that takes ordhmry programs writtenin a subset of ML and automatically compiles them intonative code that generates native code at run time. The dy-namically generated code is often much more efficient thanstatically generated code because it is optimized using run-time values. Furthermore,FABIUS optimizes the code thatdynamically generates code by using partial evaluation tech-niques, completely eliminating the need to manipulate anyintermediate representation of code at run time. This resultsin extremely low overhead: the average cost of run-time codegeneration is about six instructions executed per generatedinstruction.Although not every program benefits from run-time codegeneration, we have had little trouble finding realistic pro-grams that run significantly faster-sometimes by more than*This research was $ponsored in part by the Advanced ResearchProjects Agency CSTO under the title “The Fox Project: AdvancedLanguages for Systems Software,” ARPA Order No. C533, issued byESC/ENS under Contract No.F1962S-95-C-O050. The views andconclusions contained in this document are those of the authors andshould not he interpreted as representing the official policies, eitherexpressed or implied, of the Advanced Research Projects Agency orthe U.S. Government.Permissionto make digiWlwcf copy of PM or all of this workfor petsonalor classroomusa is granted without fee provided that oopiesare not madeor distributedfor profit or commercial advanta e, the copyright notice, the$tiffeof the ublication and itadate appear, an notice is given that\is y permissionof ACM, Ino.To copy otherwise, to republish,to‘?o% servers, or to redmtributeto lists, requires prior spsroifio permissionE30ra fee.a factor of four—when run-time code generation is used. Insome cases, theML programs compiled by FABIUS even out-perform corresponding C programs. For example, the BSDpacket filter interpreter, which the BSD operating-systemkernel uses for fast selection of network packets on behalf ofuser processes [29], runs nearly 40% faster on typical inputswhen translated from C! to ML and compiled by FABIUS.This paper describes our early experience with the FABIUScompiler, with a particular emphasis on low-level optimiza-tion and code-generation issues. We begin with a brief dk-cussion of previous approaches to run-time code generation.Then we describe, using a running example, some of thecompilation strategies implemented inFABIUS. Of course,it will not be possible to catalog all of our design decisionsin this paper, but the example should illuminate some keyproblems, insights, and implementation techniques. Next,we present the results (both good and bad) of evaluatingFABIUS using several benchmark programs. We focus ini-tially on two programs: matrix multiplication and networkpacket filtering. Besides showing good results, these pro-grams vividly illustrate the potential of this technology. Wethen discuss briefly the performance of several other bench-marks and mention the particular lessons learned from each.Finally, we describe current related work in detail, and con-clude with a number of directions for further research.2 Previous ApproachesLike the informal principle of the time-space tradeoff, thereis also a tradeoff between general-purpose and special-pur-pose programs. General-purpose programs are desirable andeven necessary for many applications. As a simple exam-ple, consider the choice between a general numerical-analysisroutine and one that has been written specially for sparse in-puts. The special version can be much faster, but might notwork well in cases where the inputs are dense. Since gener-ality usually incurs a penalty in run-time performance, pro-grammers still take the trouble to write efficient specializedprograms and use them whenever possible. Unfortunately,it is not always possible to predict ahead of time whetherthe specialized versions can be used.Of course, these tradeoffs are aa old as computer pro-gramming, and so is the idea of attacking them with general-purpose programs that generate special-purpose code at runtime. For example, in 1968 Ken Thompson implemented asearch algorithm that compiled a user-supplied regular ex-pression into an executable finite-state machine in the formof native code for the IBM 7094 [35’J. This program was bothPLDI‘S6!Y96 PA, USA01996 ACM o-69791-795-2’WO005...$505O137general (because it accepted any regul= expression) and fast(because it generated special-purpose code quickly).Run-time code generation also led to notable performanceimprovements in the areas of operating systems [4, 8, 26,33, 36], method dispatch in object-oriented systems [1, 9,13, 20}, instruction-set simulation [10], graphics [14, 31],and many other applications. With the emergence of highlydk.tributed and Web computing, more applications demandsoftware that is general-purpose, safe, and highly compos-able. This trend has prompted increased interest in run-time (and link-time) optimization and code generation as away to regain the performance advantage enjoyed by special-purpose, monolithic systems [5]. Additional arguments forrun-time code generation have been well-summarized by Kep-pel and colleagues [23, 24]2.1 General-Purpose Dynamic CompilationDespite the increasing attention, researchers have done rela-tively little to automate and optimize the process of run-timeoptimization and compilation. Indeed, existing approaches,although quite effective


View Full Document

CMU CS 15745 - Optimizing ML with Run-Time Code Generation

Documents in this Course
Lecture

Lecture

14 pages

Lecture

Lecture

19 pages

Lecture

Lecture

8 pages

Lecture

Lecture

5 pages

Lecture

Lecture

6 pages

lecture

lecture

17 pages

Lecture 3

Lecture 3

12 pages

Lecture

Lecture

17 pages

Lecture

Lecture

18 pages

lecture

lecture

14 pages

lecture

lecture

8 pages

lecture

lecture

5 pages

Lecture

Lecture

19 pages

lecture

lecture

10 pages

Lecture

Lecture

20 pages

Lecture

Lecture

8 pages

Lecture

Lecture

7 pages

lecture

lecture

59 pages

Lecture

Lecture

10 pages

Task 2

Task 2

2 pages

Handout

Handout

18 pages

Load more
Download Optimizing ML with Run-Time Code Generation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Optimizing ML with Run-Time Code Generation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Optimizing ML with Run-Time Code Generation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?