Abstract Syntax Trees COMS W4115 Parsing and Syntax Trees Actions Parsing decides if the program is part of the language Simple languages can be interpreted with parser actions Not that useful we want more than a yes no answer class CalcParser extends Parser Like most ANTLR parsers can include actions pieces of code that run when a rule is matched expr returns int r int a r 0 r mexpr a mexpr r a EOF Top down parsers actions executed during parsing rules Prof Stephen A Edwards Fall 2004 Columbia University Department of Computer Science Bottom up parsers actions executed when rule is reduced mexpr returns int r int a r 0 r atom a atom r a atom returns int r r 0 i INT r Integer parseInt i getText Actions Implementing Actions Implementing Actions In a top down parser actions are executed during the matching routines Nice thing about top down parsing grammar is essentially imperative expr returns int r int a r 0 r mexpr a mexpr r a EOF Actions can appear anywhere within a rule before during or after a match Action code simply interleaved with rule matching public final int expr int r int a r 0 r mexpr while LA 1 PLUS match PLUS a mexpr r a match Token EOF TYPE return r Easy to understand what happens when rule before A during B C D after Bottom up parsers restricted to running actions only after a rule has matched What ANTLR builds a mexpr r a Actions Actions What To Build Usually actions build a data structure that represents the program Bottom up parsers can only build bottom up data structures Typically an Abstract Syntax Tree that represents the program Separates parsing from translation Children known first parents later Makes modification easier by minimizing interactions Constructor for any object can require knowledge of children but not of parent Represents the syntax of the program almost exactly but easier for later passes to deal with Allows parts of the program to be analyzed in different orders Context of an object only established later Top down parsers can build both kinds of data structures Punctuation whitespace other irrelevant details omitted Abstract vs Concrete Trees Abstract vs Concrete Trees Like scanning and parsing objective is to discard irrelevant details expr mexpr mexpr mexpr atom atom atom INT E g comma separated lists are nice syntactically but later stages probably just want lists Parent expr mexpr mexpr INT 3 INT 5 INT 4 Abstract Syntax Tree Comment on Generic ASTs Create a new node Append a subtree as a child Is this general purpose structure too general a b a b b a a b b a a b b a a a b b b a b b a a Heterogeneous ASTs Comment on Generic ASTs Advantage avoid switch statements when walking tree ANTLR offers a compromise Disadvantage each analysis requires another method It can automatically generate tree walking code class BinOp int operator Expr left right void typeCheck void constantProp void buildThreeAddr It generates the big switch statement Analyses spread out across class files Classes become littered with analysis code additional annotations Next Sibling Typical AST Operations if Node First Child INT 5 INT 4 Concrete Parse Tree Last Sibling INT 3 atom atom atom if Most general implementation ASTs are n ary trees Each node holds a token and pointers to its first child and next sibling 3 5 4 AST structure almost a direct translation of the grammar Example of AST structure Implementing ASTs Each analysis can have its own file Still have to modify each analysis if the AST changes Choose the AST structure carefully Not very object oriented whole program represented with one type Alternative Heterogeneous ASTs one class per object class BinOp int operator Expr left right class IfThen Expr predicate Stmt thenPart elsePart Building ASTs Building an AST Automatically with ANTLR The Obvious Way to Build ASTs The Obvious Way class ASTNode ASTNode Token t void appendChild ASTNode c void appendSibling ASTNode C Putting code in actions that builds ASTs is traditional and works just fine But it s tedious Fortunately ANTLR can automate this process stmt returns ASTNode n if p expr then t stmt else e stmt n new ASTNode new Token IF n appendChild p n appendChild t n appendChild e By default each matched token becomes an AST node Each matched token or rule is made a sibling of the AST for the rule After a token makes the node a root of a subtree After a token prevents an AST node from being built Automatic AST Construction AST Construction with Annotations Running Running class CalcParser extends Parser options buildAST true expr mexpr mexpr EOF mexpr atom atom atom INT class CalcParser extends Parser options buildAST true expr mexpr mexpr EOF mexpr atom atom atom INT on 6 2 3 4 5 6 on 2 3 4 5 6 gives 2 gives 3 4 5 6 class TigerParser extends Parser options buildAST true Choosing AST Structure EOF 2 3 4 5 Designing an AST Structure Sequences of Things Sequences of Things Sequences of things Comma separated lists are common Better to choose a simpler structure for the tree Removing unnecessary punctuation int gcd int a int b int c Punctuation irrelevant build a simple list Additional grouping args arg arg int gcd int a int b int c How many token types A concrete parse tree args arg arg args ARGS args Drawbacks args arg int Many unnecessary nodes arg arg a int b int c Branching suggests recursion Harder for later routines to get the data they want ARGS arg int arg a int b arg int c What s going on here What s going on here args arg arg args ARGS args int a int b int c Rule generates a sequence of arg nodes Node generation supressed for punctuation parens commas args ARGS args Punctuation makes the syntax readable unambiguous args arg arg args ARGS args Information represented by structure of the AST args Things typically omitted from an AST arg arg arg Action uses ANTLR s terse syntax for building trees Removing Unnecessary Punctuation a int args c int arg arg b int int Separators commas semicolons Mark divisions between phrases a int Parentheses Grouping and precedence associativity overrides ARGS arg set the args tree to a new tree whose root is a node of type ARGS and whose child is the old args tree b int c Extra keywords while do if then else one is enough Additional Grouping Grouping Grouping The Tiger language from Appel s book allows mutually recursive definitions only in uninterrupted sequences Convenient to group sequences of definitions in the AST Identifying and building sequences of definitions a little tricky in ANTLR let function f1 f2 OK
View Full Document
Unlocking...