CS 421 Lecture 9: Bottom-up Parsing Announcements OCaml self-help hints Lecture outline Bottom-up parsing ocamlyacc6/18/20091Announcements MP4 has been posted MiniJava lexer Reminder: midterm exam date – Thursday, July 26/18/20092OCaml self-help hints Consult the CS 421 resource guide: http://www.cs.uiuc.edu/class/su09/cs421/ Use “Tips for using OCaml top level” to speed up working with the interactive environment Consult the OCaml manual when you want a definitive answer about something May be technical, not “user-friendly” Ask on the newsgroup If you are having a problem, it’s likely somebody has run into it already, or they will in the future. Ask Google It probably knows…6/18/20093OCaml self-help hints Be careful about Data types, and type inference Operator precedence Common OCaml error messages: syntax error (underlined) unbound value use (underlined) Pattern matching is not exhaustive. Here is a counterexample: … This expression has type <type1> but is here used with <type2> Watch out especially for “unit” <whatever error> in <file>.ml at line <line> characters <chars>6/18/20094Top-down vs. bottom-up parsing Why is top-down called “top-down?” As we consume tokens, we build a parse tree. At any one time, we are filling in the children of a particular non-terminal.As soon as we decide which production to use, we can fill in the tree. In this sense, we are building the tree from the top (root) down(to the leaves). Nature and Computer Science disagree on this point6/18/20095Top-down parsing Example: Input: x + y * z E → id T T → ε | + E | * E6/18/20096Bottom-up parsing Works by creating small parse trees and joining them together into larger ones. Example: Input: x + y * z E → id T T → ε | + E | * EStart constructing trees, put them on stack: Construct tree x: {x} Add tree +: {x, +} Add tree y: {x, +, y} Add tree *: {x, +, y, *} Add tree z: {x, +, y, *, z}6/18/20097Bottom-up parsing (cont) Construct parse tree by merging: {x, +, y, *, z} Apply T → ε {x, +, y, *, z, T → ε} …6/18/20098How bottom-up parsing works Keep a stack of small parse trees. Based on what’s in this stack, and the next input token, take one of these actions: Shift: move lookahead token to stack Reduce A → α: if roots of trees on stack match α, replace those trees on stack by single tree with root A Accept: reduce when non-terminal is the start symbol, look-ahead is EOF Reject Bottom-up parsing is also called shift-reduce parsing6/18/20099Shift-reduce example 1 Example: Input: x ; y ; z L → L ; E | E E → idAction Stack InputS x ; y ; zR E→idx ; y ; z…6/18/200910Shift-reduce example 1 Example: L → L ; E | E E → idAction Stack Input6/18/200911Shift-reduce example 2 Example: Input: x + 10 * y E → E + T | T T → T * P | P P → id | intAction Stack InputS x + 10 * yR P→idx + 10 * y…6/18/200912Shift-reduce example 2 Example: E → E + T | T T → T * P | P P → id | intAction Stack Input6/18/200913Bottom-up parsing This is hard! How can we build a parser that works like this? Shift-reduce parsing is not usually done “by hand” Automated parser generator tools Generate parser code based on grammar specification Similar to ocamllex and regular expressions for lexing Ocaml’s parser generator is called ocamlyacc “yet another compiler-compiler”6/18/200914Using ocamlyacc Create grammar specification in a text file <grammar>.mly Execute ocamlyacc <grammar>.mly Produces code for parser in <grammar>.ml interface (including type declaration for tokens) in <grammar.mli>6/18/200915Parser code <grammar>.ml defines one parsing function per entry point Parsing function takes a lexing function (lexbuf -> token) and a lexbuf as arguments Aside: we’ll see more functions being passed around as arguments soon… Returns semantic attributeof corresponding entry point6/18/200916Example – expression grammar We will take a simple expression grammar and create a parser to parse inputs and produce abstract syntax Grammar: M → Exp eof Exp → Term | Term + Exp | Term - Exp Term → Factor | Factor * Term | Factor / Term Factor → id | ( Exp ) Abstract syntax(* file: expr.ml *)type expr =Plus of expr * expr| Minus of expr * expr| Mult of expr * expr| Div of expr * expr| Id of string6/18/200917Example – lexer(* file: exprlex.mll *)let numeric = [‘0’ – ‘9’]let letter = [‘a’ – ‘z’ ‘A’ – ‘Z’]rule tokenize = parse| “+” {Plus_token}| “-” {Minus_token}| “*” {Times_token}| “/” {Divide_token}| “(” {Left_parenthesis}| “)” {Right_parenthesis}| letter (letter | numeric | “_”)* as id {Id_token id}| [‘ ’ ‘\t’ ‘\n’] {tokenize lexbuf}| eof {EOL}6/18/200918Example – parser(* file: exprparse.mly *)%{ open Expr%}%token <string> Id_token%token Left_parenthesis Right_parenthesis%token Times_token Divide_token%token Plus_token Minus_token%token EOL%start main%type <expr> main%%...6/18/200919Example – parser (exprparse.mly)expr:term {$1}| term Plus_token expr {Plus($1,$3)}| term Minus_token expr {Minus($1,$3)}term:factor {$1}| factor Times_token term {Mult($1,$3)}| factor Divide_token term {Div($1,$3)}factor:Id_token {Id $1}| Left_parenthesis expr Right_parenthesis {$2}main:| expr EOL {$1}6/18/200920Example – using parser# #use “expr.ml”;;…# #use “expparse.ml”;;…# #use “exprlex.ml”;;…# let test s =let lexbuf = Lexing.from_string(s^”\n”) inmain tokenize lexbuf;;…# test “a + b”;;- : expr = Plus(Id “a”, Id “b”) 6/18/200921ocamlyacc input File format:%{ <header>%}<declarations>%%<rules>%%<trailer>6/18/200922ocamlyacc <header> Contains arbitrary OCaml code Typically used to give types and functions needed for the semantic actions of rules and to give specialized error recovery May be omitted <footer> is similar. Possibly used to call parser.6/18/200923ocamlyacc <declarations>%token symbol…symbol Declare given symbols as tokens%token <type>
View Full Document