Unformatted text preview:

CSc 453 — Compilers and Systems Software2 : Teensy Simple IChristian CollbergDepartment of Computer ScienceUniversity of [email protected]° 2009 Christian CollbergAugust 23, 20091Introduction2 A simple language and compiler• To understand the overall organization of a modern compiler, we will design a minimal language,Teensy, and a compiler for this language, Simple.• Get the source from the file Simple1.zip from the class website. Then unzip Simple1.zip; cdPROGRAMS; make; make test.• Here’s a simple Teensy program:¨¥BEGINx = 5 ; y = 99 ; z = y + x + 9 ; PRINT z ;END§¦3 Teensy’s concrete grammarprogram → ’BEGIN’ stats ’END’stats → stat stats | ǫstat → ident ’=’ expr ’;’| ’PRINT’ expr ’;’expr → expr ’+’ expr | ident | intident → LETTER idpidp → LETTER idp | DIGIT idp | ǫint → DIGIT intpintp → DIGIT intp | ǫ14 Java classes in the Simple compiler• lexical analysis (Token, Lex),• parsing (Matcher, Parse),• abstract syntax tree construction (Parse, AST, PROGRAM, STAT, STATSEQ, ASSIGN, PRINT, EXPR, NULL,IDENT, INTLIT, BINOP),• semantic analysis (SyTab, Sem),• tree-walk interpretation (Eval), optimization (Opt),• intermediate code generation (IR, GenIR),• stack-code interpretation (Interpreter), and• machine-code generation (Mips).• The class Compiler ties it all together.5 Overview – Lexing and ParsingsourceBEGIN,PRINT,INTLIT/42,PLUS,INTLIT/3,SEMICOLON,END,EOFPRINTPROGRAMNULLINTLIT/42INTLIT/3PLUSLexBEGINENDPRINT 42+3;Parse6 Overview – Semantics & OptimizationGenCPRINTPROGRAMNULLINTLIT/45PRINTPROGRAMNULLINTLIT/42INTLIT/3PLUSint main(char** args) {}printf("%i\n",45);GenIROptASTSem45Evalgcc4527 Overview – IR & Code GenerationGenIR .datanewline:.asciiz "\n" .text .align 2 .globl mainmain: li $s0,45 move $a0,$s0 li $v0,1 syscall la $a0,newline li $v0,4 syscall li $v0,10 syscall6,42,0,4,5,7# HEADER 42,0# PRINT# PRINTLN# EXIT3,45, # PUSH 45Interpreter45spim45Mips8 Token.javapublic class Token {public final static int ILLEGAL = 0;public final static int PLUS = 1;public final static int INTLIT = 2;public final static int IDENT = 3;public final static int SEMICOLON = 4;public final static int EQUAL = 5;public final static int BEGIN = 6;public final static int END = 7;public final static int PRINT = 8;public final static int EOF = 9;public int kind;public String ident; public int value;public int position;}9 Lex.javachar ch; // lookahead characterboolean done = false; // reached end-of-filepublic Lex(String filename) throws IOException {str = new LineNumberReader(new FileReader(filename));get();}// read the next input charactervoid get() {...}Token scanNumber() {...return new Token(Token.INTLIT, ival);}310 Lex.java. . .Token scanName() {String ident = "";while ((!done) && Character.isLetterOrDigit(ch)) {ident+=ch; get();}if (ident.equals("BEGIN"))return new Token(Token.BEGIN);else if (ident.equals("END"))return new Token(Token.END);else if (ident.equals("PRINT"))return new Token(Token.PRINT);elsereturn new Token(Token.IDENT, ident);}11 Lex.java. . .public Token nextToken() {while ((!done) && ch <= ’ ’) get();if (done) return new Token(Token.EOF);switch (ch) {case ’+’: get(); return new Token(Token.PLUS);case ’;’: get(); return new Token(Token.SEMICOLON);case ’=’: get(); return new Token(Token.EQUAL);default: if (Character.isLetter(ch))return scanName();else if (Character.isDigit(ch))return scanNumber();else {get(); return new Token(Token.ILLEGAL);}}}12 ParsingThe Lexer takes an input Teensy source fileBEGINPRINT y;ENDand generates a stream of tokensBEGINPRINTIDENT: ySEMICOLONENDEOFwhich is then consumed by the parser.413 Parsing• The parser does two things:1. it checks that the input program conforms to the syntax of the language. If not, error messagesare generated.2. it generates an abstract syntax tree (AST), a tree representation of the input program.• Simple supports two parsers: Matcher only tests for syntax conformance, Parse also generates theAST.• The AST forms the basis for all further processing in Simple.14 Matcher.java. . .Lex scanner;Token currentToken;public Matcher (Lex scanner) {this.scanner = scanner;next();}void next() {currentToken = scanner.nextToken();}boolean lookahead(int tokenKind) {return currentToken.kind == tokenKind;}15 Matcher.java. . .void match(int tokenKind) {if (!lookahead(tokenKind)) {System.err.println("Parsing error");}next();}public void parse() {ENTER("parse");match(Token.BEGIN);stats();match(Token.END);match(Token.EOF);EXIT("parse");}516 Matcher.java. . .void stats() {if (lookahead(Token.IDENT)) {assign(); match(Token.SEMICOLON); stats();} else if (lookahead(Token.PRINT)) {print(); match(Token.SEMICOLON); stats();}}void assign() {match(Token.IDENT); match(Token.EQUAL); expr();}void print() {match(Token.PRINT); expr();}17 Matcher.java. . .void expr() {factor();while (lookahead(Token.PLUS)) {match(Token.PLUS);factor();}}void factor() {if (lookahead(Token.IDENT)) {match(Token.IDENT);} else if (lookahead(Token.INTLIT)) {match(Token.INTLIT);}}18 Parsing. . .> cat test23BEGINPRINT y;END> java Matcher test2ENTER parseENTER statsENTER printENTER exprENTER factorEXIT factorEXIT exprEXIT printENTER statsEXIT statsEXIT statsEXIT parse619 Concrete vs. Abstract Syntax• A language’s concrete syntax describes how it is written by the programmer – where whitspace goes,where semicolons are needed, etc.• Theabstract syntax describes the “logical structure” of the language – that while-statements have twoparts (the expresion and the loop body), that procedure calls consist of a sequence of actual parameters(expression) and the name of the called procedure, etc.• The abstract syntax also defines the nodes of the abstract syntax tree. For example, an AST while-nodewould have two children, one pointing to the expression subtree, the other to the loop body subtree.20 Teensy’s Abstract SyntaxPROGRAM → STATSEQSTATSEQ → STAT STATSEQ| NULLSTAT → ASSIGN| PRINTASSIGN → ident EXPRPRINT → EXPREXPR → BINOP | IDENT | INTLITBINOP → op EXPR EXPRIDENT → identINTLIT → int21 AST, EXPR, INTLIT¨¥public abstract c l a ss AST { }§¦¨¥public abstract cla s s EXPR extends AST { }§¦¨¥public c l a s s INTLIT extends EXPR {public int v a l ;public INTLIT ( in t va l ) { t h is . v a l = val ; }}§¦¨¥public c l a s s


View Full Document

UA CSC 453 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?