Unformatted text preview:

class CalcParser extends Parser Not that useful we want more than a yes no answer Action code simply interleaved with rule matching Actions can appear anywhere within a rule before during or after a match Children known first parents later Constructor for any object can require knowledge of children but not of parent Separates parsing from translation Makes modification easier by minimizing interactions Top down parsers can build both kinds of data structures Context of an object only established later Bottom up parsers can only build bottom up data structures Usually actions build a data structure that represents the program Allows parts of the program to be analyzed in different orders Actions Actions Bottom up parsers restricted to running actions only after a rule has matched Easy to understand what happens when Nice thing about top down parsing grammar is essentially imperative In a top down parser actions are executed during the matching routines rule before A during B C D after Implementing Actions Bottom up parsers actions executed when rule is reduced Top down parsers actions executed during parsing rules a mexpr r a What ANTLR builds Punctuation whitespace other irrelevant details omitted Represents the syntax of the program almost exactly but easier for later passes to deal with Typically an Abstract Syntax Tree that represents the program What To Build public final int expr int r int a r 0 r mexpr while LA 1 PLUS match PLUS a mexpr r a match Token EOF TYPE return r expr returns int r int a r 0 r mexpr a mexpr r a EOF Implementing Actions atom returns int r r 0 i INT r Integer parseInt i getText mexpr returns int r int a r 0 r atom a atom r a expr returns int r int a r 0 r mexpr a mexpr r a EOF Simple languages can be interpreted with parser actions Parsing decides if the program is part of the language Like most ANTLR parsers can include actions pieces of code that run when a rule is matched Actions Parsing and Syntax Trees Actions Prof Stephen A Edwards Fall 2006 Columbia University Department of Computer Science COMS W4115 Abstract Syntax Trees Classes become littered with analysis code additional annotations Analyses spread out across class files Choose the AST structure carefully Still have to modify each analysis if the AST changes Each analysis can have its own file It generates the big switch statement a class BinOp int operator Expr left right void typeCheck void constantProp void buildThreeAddr b It can automatically generate tree walking code b a Disadvantage each analysis requires another method a b a b ANTLR offers a compromise a b Advantage avoid switch statements when walking tree b a a b Building ASTs class BinOp int operator Expr left right class IfThen Expr predicate Stmt thenPart elsePart Alternative Heterogeneous ASTs one class per object Not very object oriented whole program represented with one type Is this general purpose structure too general Next Sibling Create a new node Append a subtree as a child First Child Node Comment on Generic ASTs Last Sibling Parent Each node holds a token and pointers to its first child and next sibling Most general implementation ASTs are n ary trees Implementing ASTs Typical AST Operations Abstract Syntax Tree INT 5 INT 4 INT 3 Comment on Generic ASTs b INT 4 Concrete Parse Tree INT 3 INT 5 atom atom atom mexpr mexpr expr Heterogeneous ASTs a a b if a b a b b a if Example of AST structure AST structure almost a direct translation of the grammar 3 5 4 expr mexpr mexpr mexpr atom atom atom INT Like scanning and parsing objective is to discard irrelevant details E g comma separated lists are nice syntactically but later stages probably just want lists Abstract vs Concrete Trees Abstract vs Concrete Trees 5 int b c arg int arg a int arg args Harder for later routines to get the data they want Branching suggests recursion Many unnecessary nodes Drawbacks int b arg a int arg ARGS int c arg args arg arg args ARGS args int gcd int a int b int c Punctuation irrelevant build a simple list A concrete parse tree 4 How many token types 3 args arg arg 2 Additional grouping gives int gcd int a int b int c EOF Removing unnecessary punctuation 6 Better to choose a simpler structure for the tree Comma separated lists are common 5 Sequences of things Sequences of Things 3 Sequences of Things Choosing AST Structure After a token prevents an AST node from being built After a token makes the node a root of a subtree Each matched token or rule is made a sibling of the AST for the rule By default each matched token becomes an AST node class TigerParser extends Parser options buildAST true Building an AST Automatically with ANTLR Designing an AST Structure 2 gives 2 3 4 5 6 4 class CalcParser extends Parser options buildAST true expr mexpr mexpr EOF mexpr atom atom atom INT on 6 2 3 4 5 6 class CalcParser extends Parser options buildAST true expr mexpr mexpr EOF mexpr atom atom atom INT Running Running on AST Construction with Annotations Fortunately ANTLR can automate this process Automatic AST Construction stmt returns ASTNode n if p expr then t stmt else e stmt n new ASTNode new Token IF n appendChild p n appendChild t n appendChild e Putting code in actions that builds ASTs is traditional and works just fine class ASTNode ASTNode Token t void appendChild ASTNode c void appendSibling ASTNode C But it s tedious The Obvious Way The Obvious Way to Build ASTs args ARGS args int arg c c while do if then else one is enough Extra keywords Mark divisions between phrases Separators commas semicolons Grouping and precedence associativity overrides Parentheses foo When faced with a period the second rule can repeat itself or exit string dots dots options greedy false Setting greedy false makes each dots a single period f2 string dots dots f1 var string dots dots options greedy true foo func Setting greedy true makes dots as long as possible func The greedy flag decides whether repeating a rule takes precedence when an outer rule could also work f2 var vars Hint Use ANTLR s greedy option to disambiguate this func funcs defs The Greedy Option f1 func defs let function f1 function f2 var foo 42 in end Simplifies later static semantic checks Grouping let function f1 f2 Error var foo 42 splits group function f2 in end let function f1 f2 OK function f2 in end funcs func var type vars types For the assignment you need to build a node of type BINOP for every binary operator The text indicates the actual operator Arithmetic operators usually not that


View Full Document

Columbia COMS W4115 - Abstract Syntax Trees

Documents in this Course
YOLT

YOLT

13 pages

Lattakia

Lattakia

15 pages

EasyQL

EasyQL

14 pages

Photogram

Photogram

163 pages

Espresso

Espresso

27 pages

NumLang

NumLang

6 pages

EMPATH

EMPATH

14 pages

La Mesa

La Mesa

9 pages

JTemplate

JTemplate

238 pages

MATVEC

MATVEC

4 pages

TONEDEF

TONEDEF

14 pages

SASSi

SASSi

16 pages

JTemplate

JTemplate

39 pages

BATS

BATS

10 pages

Synapse

Synapse

11 pages

c.def

c.def

116 pages

TweaXML

TweaXML

108 pages

Load more
Loading Unlocking...
Login

Join to view Abstract Syntax Trees and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Abstract Syntax Trees and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?