DOC PREVIEW
UD CISC 672 - High-level View of a Compiler

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The View from 35,000 FeetImplications • Must recognize legal (and illegal) programs • Must generate correct code • Must manage storage of all variables (and code) • Must agree with OS & linker on format for object code Big step up from assembly language—use higher level notations High-level View of a Compiler Source code Machine code Compiler ErrorsTraditional Two-pass Compiler Implications • Use an intermediate representation (IR) • Front end maps legal source code into IR • Back end maps IR into target machine code • Admits multiple front ends & multiple passes (better code) Typically, front end is O(n) or O(n log n), while back end is NPC Source code Front End Errors Machine code Back End IRCan we build n x m compilers with n+m components? • Must encode all language specific knowledge in each front end • Must encode all features in a single IR • Must encode all target specific knowledge in each back end Limited success in systems with very low-level IRs A Common Fallacy Fortran Scheme Java Smalltalk Front end Front end Front end Front end Back end Back end Target 2 Target 1 Target 3 Back endResponsibilities • Recognize legal (& illegal) programs • Report errors in a useful way • Produce IR & preliminary storage map • Shape the code for the back end • Much of front end construction can be automated The Front End Source code Scanner IR Parser Errors tokensThe Front End Scanner • Maps character stream into words—the basic unit of syntax • Produces pairs — a word & its part of speech x = x + y ; becomes <id,x> = <id,x> + <id,y> ; → word ≅ lexeme, part of speech ≅ token type → In casual speech, we call the pair a token • Typical tokens include number, identifier, +, –, new, while, if • Scanner eliminates white space (including comments) • Speed is important Source code Scanner IR Parser Errors tokensThe Front End Parser • Recognizes context-free syntax & reports errors • Guides context-sensitive (“semantic”) analysis (type checking) • Builds IR for source program Hand-coded parsers are fairly easy to build Most books advocate using automatic parser generators Source code Scanner IR Parser Errors tokensThe Front End Context-free syntax is specified with a grammar SheepNoise → baa SheepNoise | baa This grammar defines the set of noises that a sheep makes under normal circumstances It is written in a variant of Backus–Naur Form (BNF) Formally, a grammar G = (S,N,T,P) • S is the start symbol • N is a set of non-terminal symbols • T is a set of terminal symbols or words • P is a set of productions or rewrite rules (P : N → N ∪T ) (Example due to Dr. Scott K. Warren)Context-free syntax can be put to better use • This grammar defines simple expressions with addition & subtraction over “number” and “id” • This grammar, like many, falls in a class called “context-free grammars”, abbreviated CFG The Front End 1. goal → expr 2. expr → expr op term 3. | term 4. term → number 5. | id 6. op → + 7. | - S = goal T = { number, id, +, - } N = { goal, expr, term, op } P = { 1, 2, 3, 4, 5, 6, 7}Given a CFG, we can derive sentences by repeated substitution To recognize a valid sentence in some CFG, we reverse this process and build up a parse The Front End Production Result goal 1 expr 2 expr op term 5 expr op y 7 expr - y 2 expr op term - y 4 expr op 2 - y 6 expr + 2 - y 3 term + 2 - y 5 x + 2 - yThe Front End A parse can be represented by a tree (parse tree or syntax tree) x + 2 - y This contains a lot of unneeded information. term op term expr term expr goal expr op <id,x> <number,2> <id,y> + - 1. goal → expr 2. expr → expr op term 3. | term 4. term → number 5. | id 6. op → + 7. | -The Front End Compilers often use an abstract syntax tree This is much more concise ASTs are one kind of intermediate representation (IR) + - <id,x> <number,2> <id,y> The AST summarizes grammatical structure, without including detail about the derivationThe Back End Responsibilities • Translate IR into target machine code • Choose instructions to implement each IR operation • Decide which value to keep in registers • Ensure conformance with system interfaces Automation has been less successful in the back end Errors IR Register Allocation Instruction Selection Machine code Instruction Scheduling IR IRThe Back End Instruction Selection • Produce fast, compact code • Take advantage of target features such as addressing modes • Usually viewed as a pattern matching problem → ad hoc methods, pattern matching, dynamic programming This was the problem of the future in 1978 → Spurred by transition from PDP-11 to VAX-11 → Orthogonality of RISC simplified this problem Errors IR Register Allocation Instruction Selection Machine code Instruction Scheduling IR IRThe Back End Register Allocation • Have each value in a register when it is used • Manage a limited set of resources • Can change instruction choices & insert LOADs & STOREs • Optimal allocation is NP-Complete (1 or k registers) Compilers approximate solutions to NP-Complete problems Errors IR Register Allocation Instruction Selection Machine code Instruction Scheduling IR IRThe Back End Instruction Scheduling • Avoid hardware stalls and interlocks • Use all functional units productively • Can increase lifetime of variables (changing the allocation) Optimal scheduling is NP-Complete in nearly all cases Heuristic techniques are well developed Errors IR Register Allocation Instruction Selection Machine code Instruction Scheduling IR IRTraditional Three-pass Compiler Code Improvement (or Optimization) • Analyzes IR and rewrites (or transforms) IR • Primary goal is to reduce running time of the compiled code → May also improve space, power consumption, … • Must preserve “meaning” of the code → Measured by values of named variables Errors Source Code Middle End Front End Machine code Back End IR IRThe Optimizer (or Middle End) Typical Transformations • Discover & propagate some


View Full Document

UD CISC 672 - High-level View of a Compiler

Documents in this Course
Syllabus

Syllabus

18 pages

Load more
Download High-level View of a Compiler
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view High-level View of a Compiler and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view High-level View of a Compiler 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?