Programming Languages and Compilers (CS 421)Parsing ProgramsRecursive Decent ParsingSlide 4Slide 5Sample GrammarTokens as OCaml TypesParse Trees as DatatypesSlide 9Slide 10Parsing Lists of TokensParsing an ExpressionSlide 13Parsing a Plus ExpressionSlide 15Slide 16Building Plus Expression Parse TreeParsing a Minus ExpressionSlide 19Parsing an Expression as a TermParsing Factor as IdParsing Factor as Parenthesized ExpressionSlide 23Error Cases( a + b ) * c - dSlide 26( a + b ) * c – da + b * c – dSlide 29( a + b * c - da + b ) * c - d *)Streams in Place of ListsProblems for Recursive-Descent ParsingSlide 34Pairwise Disjointedness TestExampleEliminating Left RecursionFactoring GrammarSlide 39Programming Languages and Compilers (CS 421)Elsa L Gunter2112 SC, UIUChttp://www.cs.uiuc.edu/class/sp07/cs421/Based in part on slides by Mattox Beckman, as updated by Vikram Adve and Gul AghaElsa L. Gunter Parsing Programs•Parsing is the process of tracing or constructing a parse tree for a given input string•Process usually broken into two phases:–Lexer: generating tokens from string or character stream–Parser: generating parse tree from token list or stream–Lexer called from parserElsa L. Gunter Recursive Decent Parsing•Recursive decent parsers are a class of parsers derived fairly directly from BNF grammar•A recursive descent parser traces out a parse tree in top-down order, corresponding to a left-most derivation (LL - left-to-right scanning, leftmost derivation)Elsa L. Gunter Recursive Decent Parsing•Each nonterminal in the grammar has a subprogram associated with it; the subprogram parses all phrases that the nonterminal can generate•Each nonterminal in right-hand side of a rule corresponds to a recursive call to the associated subprogramElsa L. Gunter Recursive Decent Parsing•Each subprogram must be able to decide how to begin parsing by looking at the left-most character in the string to be parsed–May do so directly, or indirectly by calling another parsing subprogram •Recursive descent parsers, like other top-down parsers, cannot be built from left-recursive grammarsElsa L. Gunter Sample Grammar<expr> ::= <term> | <term> + <expr> | <term> - <expr><term> ::= <factor> | <factor> * <term> | <factor> / <term><factor> ::= <id> | ( <expr> )Elsa L. Gunter Tokens as OCaml Types•+ - * / ( ) <id>•Becomes an OCaml datatypetype token = Id_token of string | Left_parenthesis | Right_parenthesis | Times_token | Divide_token | Plus_token | Minus_tokenElsa L. Gunter Parse Trees as Datatypes<expr> ::= <term> | <term> + <expr> | <term> - <expr>type expr = Term_as_Expr of term | Plus_Expr of (term * expr) | Minus_Expr of (term * expr)Elsa L. Gunter Parse Trees as Datatypes<term> ::= <factor> | <factor> * <term> | <factor> / <term>and term = Factor_as_Term of factor | Mult_Term of (factor * term) | Div_Term of (factor * term)Elsa L. Gunter Parse Trees as Datatypes<factor> ::= <id> | ( <expr> )and Factor = Id_as_Factor of string | Parenthesized_Expr_as_Factor of exprElsa L. Gunter Parsing Lists of Tokens•Will create three mutually recursive functions:–expr : token list -> (expr * token list)–term : token list -> (term * token list)–factor : token list -> (factor * token list)•Each parses what it can and gives back parse and remaining tokensElsa L. Gunter <expr> ::= <term> [( + | - ) <expr> ] let rec expr tokens = (match term tokens with ( term_parse , tokens_after_term) -> (match tokens_after_term with( Plus_token :: tokens_after_plus) ->Parsing an ExpressionElsa L. Gunter <expr> ::= <term> [( + | - ) <expr> ] let rec expr tokens = (match term tokens with ( term_parse , tokens_after_term) -> (match tokens_after_term with ( Plus_token :: tokens_after_plus) ->Parsing an ExpressionElsa L. Gunter <expr> ::= <term> [( + | - ) <expr> ] let rec expr tokens = (match term tokens with ( term_parse , tokens_after_term) -> (match tokens_after_term with ( Plus_token :: tokens_after_plus) ->Parsing a Plus ExpressionElsa L. Gunter Parsing a Plus Expression<expr> ::= <term> + <expr> (match expr tokens_after_plus with ( expr_parse , tokens_after_expr) ->( Plus_Expr ( term_parse , expr_parse ), tokens_after_expr))Elsa L. Gunter <expr> ::= <term> + <expr> (match expr tokens_after_plus with ( expr_parse , tokens_after_expr) ->( Plus_Expr ( term_parse , expr_parse ), tokens_after_expr))Parsing a Plus ExpressionElsa L. Gunter Building Plus Expression Parse Tree<expr> ::= <term> + <expr> (match expr tokens_after_plus with ( expr_parse , tokens_after_expr) ->( Plus_Expr ( term_parse , expr_parse ), tokens_after_expr))Elsa L. Gunter <expr> ::= <term> - <expr> | ( Minus_token :: tokens_after_minus) -> (match expr tokens_after_minus with ( expr_parse , tokens_after_expr) ->( Minus_Expr ( term_parse , expr_parse ), tokens_after_expr))Parsing a Minus ExpressionElsa L. Gunter Parsing a Minus Expression<expr> ::= <term> - <expr> | ( Minus_token :: tokens_after_minus) -> (match expr tokens_after_minus with ( expr_parse , tokens_after_expr) ->( Minus_Expr ( term_parse , expr_parse ), tokens_after_expr))Elsa L. Gunter <expr> ::= <term> | _ -> (Term_as_Expr term_parse , tokens_after_term)))•Code for term is same except for replacing addition with multiplication and subtraction with divisionParsing an Expression as a TermElsa L. Gunter Parsing Factor as Id<factor> ::= <id> and factor (Id_token id_name :: tokens) = ( Id_as_Factor id_name, tokens)Elsa L. Gunter <factor> ::= ( <expr> ) | factor ( Left_parenthesis :: tokens) = (match expr tokens with ( expr_parse , tokens_after_expr) ->Parsing Factor as Parenthesized ExpressionElsa L. Gunter <factor> ::= ( <expr> )(match tokens_after_exprwith Right_parenthesis :: tokens_after_rparen -> ( Parenthesized_Expr_as_Factor expr_parse , tokens_after_rparen)Parsing Factor as Parenthesized ExpressionElsa L. Gunter Error Cases•What if no matching right parenthesis? | _ -> raise (Failure "No matching rparen") ))•What if no leading id or left parenthesis? | _ -> raise (Failure "No id or lparen" ));;Elsa L. Gunter ( a + b ) * c - dexpr [Left_parenthesis, Id_token "a", Plus_token, Id_token "b",Right_parenthesis, Times_token, Id_token "c", Minus_token, Id_token "d"];Elsa L. Gunter ( a + b ) * c - d- : expr * token
View Full Document