DOC PREVIEW
UW-Madison CS 536 - Lecture 04.4

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

30CS 536 Spring 2007©The Structure of a CompilerA compiler performs two majortasks:• Analysis of the source program beingcompiled• Synthesis of a target programAlmost all modern compilersare syntax-directed: Thecompilation process is drivenby the syntactic structure of thesource program.A parser builds semanticstructure out of tokens, theelementary symbols ofprogramming language syntax.Recognition of syntacticstructure is a major part of theanalysis task.31CS 536 Spring 2007©Semantic analysis examines themeaning (semantics) of theprogram. Semantic analysisplays a dual role.It finishes the analysis task byperforming a variety ofcorrectness checks (forexample, enforcing type andscope rules). Semantic analysisalso begins the synthesisphase.The synthesis phase maytranslate source programs intosome intermediaterepresentation (IR) or it maydirectly generate target code.32CS 536 Spring 2007©If an IR is generated, it thenserves as input to a codegenerator component thatproduces the desired machine-language program. The IR mayoptionally be transformed byan optimizer so that a moreefficient program may begenerated.33CS 536 Spring 2007©Type CheckerOptimizerCodeScannerSymbol TablesParserSourceProgram(CharacterStream)TokensSyntaxTree(AST)DecoratedASTIntermediateRepresentation(IR)IRGeneratorTarget MachineCodeTranslatorAbstractThe Structure of a Syntax-Directed Compiler34CS 536 Spring 2007©ScannerThe scanner reads the sourceprogram, character bycharacter. It groups individualcharacters into tokens(identifiers, integers, reservedwords, delimiters, and so on).When necessary, the actualcharacter string comprising thetoken is also passed along foruse by the semantic phases.The scanner:• Puts the program into a compact anduniform format (a stream of tokens).• Eliminates unneeded information(such as comments).• Sometimes enters preliminaryinformation into symbol tables (for35CS 536 Spring 2007©example, to register the presence of aparticular label or identifier).• Optionally formats and lists thesource programBuilding tokens is driven bytoken descriptions definedusing regular expressionnotation.Regular expressions are aformal notation able todescribe the tokens used inmodern programminglanguages. Moreover, they candrive the automatic generationof working scanners given onlya specification of the tokens.Scanner generators (like Lex,Flex and Jlex) are valuablecompiler-building tools.36CS 536 Spring 2007©ParserGiven a syntax specification (asa context-free grammar, CFG),the parser reads tokens andgroups them into languagestructures.Parsers are typically createdfrom a CFG using a parsergenerator (like Yacc, Bison orJava CUP).The parser verifies correctsyntax and may issue a syntaxerror message.As syntactic structure isrecognized, the parser usuallybuilds an abstract syntax tree(AST), a concise representationof program structure, whichguides semantic processing.37CS 536 Spring 2007©Type Checker(Semantic Analysis)The type checker checks thestatic semantics of each ASTnode. It verifies that the constructis legal and meaningful (that allidentifiers involved are declared,that types are correct, and so on).If the construct is semanticallycorrect, the type checker“decorates” the AST node, addingtype or symbol table informationto it. If a semantic error isdiscovered, a suitable errormessage is issued.Type checking is purelydependent on the semantic rulesof the source language. It isindependent of the compiler’starget machine.38CS 536 Spring 2007©Translator(Program Synthesis)If an AST node is semanticallycorrect, it can be translated.Translation involves capturingthe run-time “meaning” of aconstruct.For example, an AST for a whileloop contains two subtrees,one for the loop’s controlexpression, and the other forthe loop’s body. Nothing in theAST shows that a while looploops! This “meaning” iscaptured when a while loop’sAST is translated. In the IR, thenotion of testing the value ofthe loop control expression,39CS 536 Spring 2007©and conditionally executing theloop body becomes explicit.The translator is dictated by thesemantics of the sourcelanguage. Little of the nature ofthe target machine need bemade evident. Detailedinformation on the nature ofthe target machine (operationsavailable, addressing, registercharacteristics, etc.) is reservedfor the code generation phase.In simple non-optimizingcompilers (like our classproject), the translatorgenerates target code directly,without using an IR.More elaborate compilers mayfirst generate a high-level IR40CS 536 Spring 2007©(that is source languageoriented) and thensubsequently translate it into alow-level IR (that is targetmachine oriented). Thisapproach allows a cleanerseparation of source and targetdependencies.41CS 536 Spring 2007©OptimizerThe IR code generated by thetranslator is analyzed andtransformed into functionallyequivalent but improved IR codeby the optimizer.The term optimization ismisleading: we don’t alwaysproduce the best possibletranslation of a program, evenafter optimization by the best ofcompilers.Why?Some optimizations areimpossible to do in allcircumstances because theyinvolve an undecidable problem.Eliminating unreachable (“dead”)code is, in general, impossible.42CS 536 Spring 2007©Other optimizations are tooexpensive to do in all cases.These involve NP-completeproblems, believed to beinherently exponential.Assigning registers to variablesis an example of an NP-completeproblem.Optimization can be complex; itmay involve numeroussubphases, which may need tobe applied more than once.Optimizations may be turned offto speed translation.Nonetheless, a well designedoptimizer can significantly speedprogram execution bysimplifying, moving oreliminating unneededcomputations.43CS 536 Spring 2007©Code GeneratorIR code produced by thetranslator is mapped into targetmachine code by the codegenerator. This phase usesdetailed information about thetarget machine and includesmachine-specific optimizationslike register allocation and codescheduling.Code generators can be quitecomplex since good targetcode requires consideration ofmany special cases.Automatic generation of codegenerators is possible. Thebasic approach is to match alow-level IR to targetinstruction templates, choosing44CS 536 Spring 2007©instructions which best matcheach IR instruction.A well-known compiler usingautomatic code generationtechniques is the GNU Ccompiler. GCC is a heavilyoptimizing compiler withmachine description


View Full Document

UW-Madison CS 536 - Lecture 04.4

Download Lecture 04.4
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 04.4 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 04.4 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?