The View from 35,000 FeetImplications • Must recognize legal (and illegal) programs • Must generate correct code • Must manage storage of all variables (and code) • Must agree with OS & linker on format for object code Big step up from assembly language—use higher level notations High-level View of a Compiler Source code Machine code Compiler ErrorsTraditional Two-pass Compiler Implications • Use an intermediate representation (IR) • Front end maps legal source code into IR • Back end maps IR into target machine code • Admits multiple front ends & multiple passes (better code) Typically, front end is O(n) or O(n log n), while back end is NPC Source code Front End Errors Machine code Back End IRCan we build n x m compilers with n+m components? • Must encode all language specific knowledge in each front end • Must encode all features in a single IR • Must encode all target specific knowledge in each back end Limited success in systems with very low-level IRs A Common Fallacy Fortran Scheme Java Smalltalk Front end Front end Front end Front end Back end Back end Target 2 Target 1 Target 3 Back endResponsibilities • Recognize legal (& illegal) programs • Report errors in a useful way • Produce IR & preliminary storage map • Shape the code for the back end • Much of front end construction can be automated The Front End Source code Scanner IR Parser Errors tokensThe Front End Scanner • Maps character stream into words—the basic unit of syntax • Produces pairs — a word & its part of speech x = x + y ; becomes <id,x> = <id,x> + <id,y> ; → word ≅ lexeme, part of speech ≅ token type → In casual speech, we call the pair a token • Typical tokens include number, identifier, +, –, new, while, if • Scanner eliminates white space (including comments) • Speed is important Source code Scanner IR Parser Errors tokensThe Front End Parser • Recognizes context-free syntax & reports errors • Guides context-sensitive (“semantic”) analysis (type checking) • Builds IR for source program Hand-coded parsers are fairly easy to build Most books advocate using automatic parser generators Source code Scanner IR Parser Errors tokensThe Front End Context-free syntax is specified with a grammar SheepNoise → baa SheepNoise | baa This grammar defines the set of noises that a sheep makes under normal circumstances It is written in a variant of Backus–Naur Form (BNF) Formally, a grammar G = (S,N,T,P) • S is the start symbol • N is a set of non-terminal symbols • T is a set of terminal symbols or words • P is a set of productions or rewrite rules (P : N → N ∪T ) (Example due to Dr. Scott K. Warren)Context-free syntax can be put to better use • This grammar defines simple expressions with addition & subtraction over “number” and “id” • This grammar, like many, falls in a class called “context-free grammars”, abbreviated CFG The Front End 1. goal → expr 2. expr → expr op term 3. | term 4. term → number 5. | id 6. op → + 7. | - S = goal T = { number, id, +, - } N = { goal, expr, term, op } P = { 1, 2, 3, 4, 5, 6, 7}Given a CFG, we can derive sentences by repeated substitution To recognize a valid sentence in some CFG, we reverse this process and build up a parse The Front End Production Result goal 1 expr 2 expr op term 5 expr op y 7 expr - y 2 expr op term - y 4 expr op 2 - y 6 expr + 2 - y 3 term + 2 - y 5 x + 2 - yThe Front End A parse can be represented by a tree (parse tree or syntax tree) x + 2 - y This contains a lot of unneeded information. term op term expr term expr goal expr op <id,x> <number,2> <id,y> + - 1. goal → expr 2. expr → expr op term 3. | term 4. term → number 5. | id 6. op → + 7. | -The Front End Compilers often use an abstract syntax tree This is much more concise ASTs are one kind of intermediate representation (IR) + - <id,x> <number,2> <id,y> The AST summarizes grammatical structure, without including detail about the derivationThe Back End Responsibilities • Translate IR into target machine code • Choose instructions to implement each IR operation • Decide which value to keep in registers • Ensure conformance with system interfaces Automation has been less successful in the back end Errors IR Register Allocation Instruction Selection Machine code Instruction Scheduling IR IRThe Back End Instruction Selection • Produce fast, compact code • Take advantage of target features such as addressing modes • Usually viewed as a pattern matching problem → ad hoc methods, pattern matching, dynamic programming This was the problem of the future in 1978 → Spurred by transition from PDP-11 to VAX-11 → Orthogonality of RISC simplified this problem Errors IR Register Allocation Instruction Selection Machine code Instruction Scheduling IR IRThe Back End Register Allocation • Have each value in a register when it is used • Manage a limited set of resources • Can change instruction choices & insert LOADs & STOREs • Optimal allocation is NP-Complete (1 or k registers) Compilers approximate solutions to NP-Complete problems Errors IR Register Allocation Instruction Selection Machine code Instruction Scheduling IR IRThe Back End Instruction Scheduling • Avoid hardware stalls and interlocks • Use all functional units productively • Can increase lifetime of variables (changing the allocation) Optimal scheduling is NP-Complete in nearly all cases Heuristic techniques are well developed Errors IR Register Allocation Instruction Selection Machine code Instruction Scheduling IR IRTraditional Three-pass Compiler Code Improvement (or Optimization) • Analyzes IR and rewrites (or transforms) IR • Primary goal is to reduce running time of the compiled code → May also improve space, power consumption, … • Must preserve “meaning” of the code → Measured by values of named variables Errors Source Code Middle End Front End Machine code Back End IR IRThe Optimizer (or Middle End) Typical Transformations • Discover & propagate some
View Full Document