What's a Compiler???What's a Language Translator???Example Language TranslatorsCompiler InputCompiler OutputCompiler TasksCompiler PhasesCompiler Phases -- Lexical analysisLexical analysis --- ExampleCompiler Phases -- Syntactic analysisSyntactic analysis --- ExampleSyntactic analysis --- ExampleCompiler Phases -- Semantic analysisSemantic analysis --- ExampleCompiler Phases -- IR GenerationIR Generation --- ExampleCompiler Phases -- Code OptimizationCompiler Phases -- Code GenerationMulti-pass CompilationMulti-pass Compilationldots Mix-and-Match CompilersExampleExample -- Lexical AnalysisExample -- Lexical Analysisldots Example -- Syntactic AnalysisExample -- Semantic AnalysisExample -- IR GenerationExample -- IR Generationldots Example -- Code OptimizationExample -- Machine Code GenerationInterpretationInterpretationldots Kinds of InterpretersKinds of Interpretersldots Actions in an InterpreterActions in an Interpreterldots Readings and ReferencesSummarySummaryldots Summaryldots520 —Spring 2008 —2CSc 520Principles of ProgrammingLanguages2: TranslatorsChristian [email protected] of Computer ScienceUniversity of ArizonaCopyrightc 2008 Christian Collberg[1]520 —Spring 2008 —2What’s a Compiler???At the very basic level a compiler translates a computerprogram from source code to some kind of executablecode:VMasmsourceCompilerx.classx.sx.javax.cerrorsOften the source code is simply a text file and theexecutable code is a resulting assembly languageprogram:gcc -S x.c reads the C source file x.cand generates an assembly code file x.s. Or the outputcan be avirtual machine code: javac x.javaproduces x.class.[2]520 —Spring 2008 —2What’s a Language Translator???A compiler is really a special case of alanguage translator.A translator is a program that transforms a “program” P1written in a language L1into a program P2written inanother language L2.Typically, we desire P1and P2to be semanticallyequivalent, i.e. they should behave identically.[3]520 —Spring 2008 —2Example Language Translatorssource language translator target languageLATEXlatex2html−→ htmlPostscriptps2ascii−→ textFORTRANf2c−→ CC++cfront−→ CCgcc−→ assembly.classSourceAgain−→ Javax86 binaryfx32−→ Alpha binary[4]520 —Spring 2008 —2Compiler InputText File Common on Unix.Syntax Tree A structure editor uses its knowledge of thesource language syntax to help the user edit & run theprogram. It can send a syntax tree to the compiler,relieving it of lexing & parsing.[5]520 —Spring 2008 —2Compiler OutputAssembly Code Unix compilers do this. Slow, but easy forthe compiler.Object Code .o-files on Unix. Faster, since we don’t have tocall the assembler.Executable Code Called aload-and-go-compiler.Abstract Machine Code Serves as input to aninterpreter.Fast turnaround time.C-code Good for portability.[6]520 —Spring 2008 —2Compiler TasksStatic Semantic Analysis Is the program (statically) correct? Ifnot, produce error messages to the user.Code Generation The compiler must produce code that canbe executed.Symbolic Debug Information The compiler should produce adescription of the source program needed by symbolicdebuggers. Tryman gdb .Cross References The compiler may produce cross-referencinginformation. Where are identifiers declared &referenced?Profiler Information Where does my program spend most ofits execution time? Tryman gprof .[7]520 —Spring 2008 —2Compiler PhasesSemantic AnalysisSyntactic AnalysisLexical AnalysisIntermediate CodeGenerationCode OptimizationMachine CodeGenerationA N A L Y S I S S Y N T H E S I S[8]520 —Spring 2008 —2Compiler Phases – Lexical analysisThe lexer reads the source file and divides the text intolexical units (tokens), such as:Reserved words BEGIN, IF,...identifiers x, StringTokenizer,...special characters +, ∗, −, ˆ,...numbers 30, 3.14,...comments (*text*),strings "text".Lexical errors (such as ’illegal character’, ’undelimitedcharacter string’, ’comment without end’) are reported.[9]520 —Spring 2008 —2Lexical analysis — ExampleLexical Analysis of EnglishThe sentenceThe boy’s cowbell won’t play.would be translated to the list of tokensthe, boy+possessive, cowbell, will, not, playLexical Analysis of JavaThe sentencex = 3.14*(9.0+y);would be translated to the list of tokens<ID,x>, EQ, <FLOAT,3.14>, STAR, LPAREN,<FLOAT,9.0>, PLUS, <ID,y>, RPAREN, SEMICOLON[10]520 —Spring 2008 —2Compiler Phases – Syntactic analysisSyntax analysis (parsing) determines the structure of asentence.The compiler reads the tokens produced during lexicalanalysis and builds anabstract syntax tree (AST),which reflects the hierarchical structure of the inputprogram.Syntactic errors are reported to the user:’missing ;’,’BEGIN without END’[11]520 —Spring 2008 —2Syntactic analysis — ExampleSyntactic Analysis of EnglishThe sentenceThe boy plays cowbell.would be parsed into the treeNPSVPDet N VNNPTheboy plays cowbellS=sentence, NP=noun phrase, VP=verb phrase,Det=determiner, N=noun, V=verb.[12]520 —Spring 2008 —2Syntactic analysis — ExampleSyntactic Analysis of JavaThe sentencex = 3.14*(9.0+y);would be parsed into the ASTAssignMULFLOAT3.14ADDFLOAT IDy9.0IDx[13]520 —Spring 2008 —2Compiler Phases – Semantic analysisThe AST produced during syntactic analysis isdecorated with attributes, which are then evaluated.The attributes can represent any kind of informationsuch as expression types.The compiler also collects information useful for futurephases.Semantic errors are reported:’identifier not declared’,’integer type expected’.[14]520 —Spring 2008 —2Semantic analysis — ExampleSemantic Analysis of EnglishIn the sentenceThe boy plays his cowbell.we determine thathis refers to the boy.Semantic Analysis of JavaIn the sentencestatic float luftballons = 10;void P() {int luftballons = 99;System.out.println(luftballons);}the compiler must determinewhich luftballons the print-statement refers to,that float luftballons=10 has the wrong type.[15]520 —Spring 2008 —2Compiler Phases – IR GenerationFrom the decorated AST this phase generatesintermediate code (IR).The IR adds an extra a level of abstraction between thehigh level AST and the very low level assembly code wewant to generate. This simplifies code generation.A carefully designed IR allows us to build compilers fora number of languages and machines, by mixing andmatching front-ends and back-ends.[16]520 —Spring 2008 —2IR
View Full Document