DOC PREVIEW
Columbia COMS W4115 - Abstract Syntax Trees

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Abstract Syntax TreesCOMS W4115Prof. Stephen A. EdwardsFall 2005Columbia UniversityDepartment of Computer ScienceParsing and Syntax TreesParsing decides if the program is part of the language.Not that useful: we want more than a yes/no answer.Like most, ANTLR parsers can include actions: pieces ofcode that run when a rule is matched.Top-down parsers: actions executed during parsing rules.Bottom-up parsers: actions executed when rule is“reduced.”ActionsSimple languages can be interpreted with parser actions.class CalcParser extends Parser;expr returns [int r]{ int a; r=0; }: r=mexpr ("+" a=mexpr { r += a; } )*EOF ;mexpr returns [int r]{ int a; r=0; }: r=atom ("*" a=atom { r*= a; } )*;atom returns [int r]{ r=0; }: i:INT{ r = Integer.parseInt(i.getText()); } ;ActionsIn a top-down parser, actions are executed during thematching routines.Actions can appear anywhere within a rule: before, during,or after a match.rule{ /*before*/ }: A { /*during*/ } B| C D{ /*after*/ } ;Bottom-up parsers restricted to running actions only aftera rule has matched.Implementing ActionsNice thing about top-down parsing: grammar is essentiallyimperative.Action code simply interleaved with rule-matching.Easy to understand what happens when.Implementing Actionsexpr returns [int r] { int a; r=0; }: r=mexpr ("+" a=mexpr { r += a; } )*EOF ;public final int expr() { // What ANTLR buildsint r;int a; r=0;r=mexpr();while ((LA(1)==PLUS)) { // ( )*match(PLUS); // "+"a=mexpr(); // a=mexprr += a; // { r += a; }}match(Token.EOF_TYPE);return r;}ActionsUsually, actions build a data structure that represents theprogram.Separates parsing from translation.Makes modification easier by minimizing interactions.Allows parts of the program to be analyzed in differentorders.ActionsBottom-up parsers can only build bottom-up datastructures.Children known first, parents later.→ Constructor for any object can require knowledge ofchildren, but not of parent.Context of an object only established later.Top-down parsers can build both kinds of data structures.What To Build?Typically, an Abstract Syntax Tree that represents theprogram.Represents the syntax of the program almost exactly, buteasier for later passes to deal with.Punctuation, whitespace, other irrelevant details omitted.Abstract vs. Concrete TreesLike scanning and parsing, objective is to discardirrelevant details.E.g., comma-separated lists are nice syntactically, butlater stages probably just want lists.AST structure almost a direct translation of the grammar.Abstract vs. Concrete Treesexpr : mexpr ("+" mexpr )*;mexpr : atom ("*" atom )*;atom : INT ;3 + 5*4exprmexpratomINT:3”+” mexpratomINT:5”+” atomINT:4+INT:3 *INT:5 INT:4Concrete Parse Tree Abstract Syntax TreeImplementing ASTsMost general implementation: ASTs are n-ary trees.Each node holds a token and pointers to its first child andnext sibling:ParentLast SiblingNodeNext SiblingFirst ChildExample of AST structureif>a b-=a b-=b aif> -= -=a b a b b aTypical AST OperationsCreate a new node; Append a subtree as a child.> -=a b a b+-=b a=> -= -=a b a b b aComment on Generic ASTsIs this general-purpose structure too general?Not very object-oriented: whole program represented withone type.Alternative: Heterogeneous ASTs: one class per object.class BinOp {int operator; Expr left, right;};class IfThen {Expr predicate; Stmt thenPart, elsePart;};Heterogeneous ASTsAdvantage: avoid switch statements when walking tree.Disadvantage: each analysis requires another method.class BinOp {int operator; Expr left, right;void typeCheck() { ... };void constantProp() { ... };void buildThreeAddr() { ... };};Analyses spread out across class files.Classes become littered with analysis code, additionalannotations.Comment on Generic ASTsANTLR offers a compromise:It can automatically generate tree-walking code.→ It generates the big switch statement.Each analysis can have its own file.Still have to modify each analysis if the AST changes.→ Choose the AST structure carefully.Building ASTsThe Obvious Way to Build ASTsclass ASTNode {ASTNode( Token t ) { ... }void appendChild( ASTNode c ) { ... }void appendSibling( ASTNode C) { ... }}stmt returns [ASTNode n]: ’if’ p=expr ’then’ t=stmt ’else’ e=stmt{ n = new ASTNode(new Token("IF"));n.appendChild(p);n.appendChild(t);n.appendChild(e); } ;The Obvious WayPutting code in actions that builds ASTs is traditional andworks just fine.But it’s tedious.Fortunately, ANTLR can automate this process.Building an AST Automatically withANTLRclass TigerParser extends Parser;options {buildAST=true;}By default, each matched token becomes an AST node.Each matched token or rule is made a sibling of the ASTfor the rule.After a token, ˆ makes the node a root of a subtree.After a token, ! prevents an AST node from being built.Automatic AST ConstructionRunningclass CalcParser extends Parser;options { buildAST=true; }expr : mexpr (’+’ mexpr)*EOF ;mexpr : atom (’*’ atom)*;atom : INT ;on2*3+4*5+6gives2*3+4*5+6EOFAST Construction with AnnotationsRunningclass CalcParser extends Parser;options { buildAST=true; }expr : mexpr (’+’ˆ mexpr)*EOF! ;mexpr : atom (’*’ˆ atom)*;atom : INT ;on2*3+4*5+6gives++6* *2345Choosing ASTStructureDesigning an AST StructureSequences of thingsRemoving unnecessary punctuationAdditional groupingHow many token types?Sequences of ThingsComma-separated lists are commonint gcd(int a, int b, int c)args : "(" ( arg ("," arg)*)? ")" ;A concrete parse tree:args(,,argint aargint bargint c)Drawbacks:Many unnecessary nodesBranching suggests recursionHarder for later routines to getthe data they wantSequences of ThingsBetter to choose a simpler structure for the tree.Punctuation irrelevant; build a simple list.int gcd(int a, int b, int c)args : "("! ( arg (","! arg)*)? ")"!{ #args = #([ARGS], args); } ;ARGSargint aargint bargint cWhat’s going on here?args : "("! ( arg (","! arg)*)? ")"!{ #args = #([ARGS], args); } ;Rule generates a sequence of arg nodes.Node generation supressed for punctuation (parens,commas).Action uses ANTLR’s terse syntax for building trees.{#args = #( [ARGS] , args ) ; } ;“set the args tree to a new tree whose root is a node oftype ARGS and whose child is the old args tree”What’s going on here?(int a, int b, int c)args : "("! ( arg (","! arg)*)? ")"!{ #args = #([ARGS], args); } ;#argsarg arg argintaintbintc#argsARGSarg arg argintaintbintcRemoving UnnecessaryPunctuationPunctuation makes the syntax readable, unambiguous.Information represented


View Full Document

Columbia COMS W4115 - Abstract Syntax Trees

Documents in this Course
YOLT

YOLT

13 pages

Lattakia

Lattakia

15 pages

EasyQL

EasyQL

14 pages

Photogram

Photogram

163 pages

Espresso

Espresso

27 pages

NumLang

NumLang

6 pages

EMPATH

EMPATH

14 pages

La Mesa

La Mesa

9 pages

JTemplate

JTemplate

238 pages

MATVEC

MATVEC

4 pages

TONEDEF

TONEDEF

14 pages

SASSi

SASSi

16 pages

JTemplate

JTemplate

39 pages

BATS

BATS

10 pages

Synapse

Synapse

11 pages

c.def

c.def

116 pages

TweaXML

TweaXML

108 pages

Load more
Download Abstract Syntax Trees
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Abstract Syntax Trees and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Abstract Syntax Trees 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?