CS 421 Lecture 7: Grammars and parsing Announcements MP2 review Lecture outline Context-free grammars Top-down, a.k.a. recursive descent, parsing6/14/20091Announcements TA office hours I2CS: Tue, Thu 4-5pm CDT On-campus: Wed 4-5pm CDT MP2 solutions posted6/14/20092MP2 review Problem 7flatten : ‘a list list -> ‘a listflatten [[1;2;3]; [4;5]; [8;2;3;4]];;let rec flatten lst = match lst with …6/14/20093MP2 review Problem 7flatten : ‘a list list -> ‘a listflatten [[1;2;3]; [4;5]; [8;2;3;4]];;let rec flatten lst = match lst with[] -> []| []::xs -> flatten xs| (x::xs)::ys -> x::(flatten (xs::ys));;6/14/20094Review: compiler front-end6/14/20095TokensSourceLexerASTParserIntro to grammars and languages Grammar Finite set of terminals Finite set of non-terminals Finite set of production rules Start symbol Language Set of strings recognized by a grammar6/14/20096Grammars: Chomsky hierarchy Unrestricted Recursively-enumerable languages Recognized by a Turing machine Context-sensitive Context-sensitive languages Recognized by a linear bounded automaton (LBA) Context-free Context-free languages Recognized by a push-down automaton (PDA) Regular Regular languages Recognized by a finite state automaton (FSA)6/14/20097Context-free grammar Given: Set of terminals (tokens) T Set of non-terminals (variables) V A cfgGis a set of productionsof the formA→X1…Xn(n≥ 0)whereA∈V, X1…Xn∈G= V ∪T One symbol designated as “start symbol”6/14/20098NotationA→X1…Xn Also written A::= X1…Xn When n= 0, write A→ ε Instead ofA→ When there is more than one production from A, sayA→X1…XnandA→Y1…Yn Instead write: A→X1…Xn|Y1…Yn6/14/20099Example Expressions Exp → intlit | variable | Exp + Exp | Exp * Exp Sentences include 3 x 3+x 3+x*y Tree representation6/14/200910Example Method definition:MethodDef → Type ident ‘(’ Args ‘)’ ‘{‘ Stmtlist ‘}’Args → ε | NonEmptyArgsNonEmptyArgs → Type ident | Type ident ‘,’ NonEmptyArgsStmtlist → ε | Stmt StmtlistType → ident | int | boolean Sentence:int fun(boolean b) { } Tree representation ??6/14/200911Syntax trees A (concrete)syntax tree is a tree whose internal nodes are labeled with non-terminals such that if a node is labeled A, its children are leabeledX1, … , Xnfor some production A→X1, … , Xn Sentences of a grammar are frontiersof the syntax tree whose root is the start symbol.6/14/200912More notation Backus-Naur Form (BNF) Symbol → expression Expression: terminals, symbols, | Extended BNF (EBNF) Symbol → “terminal” | ‘terminal’ | <symbol> | … ; RegExp-like extensions: exp*, exp+, exp?, etc.6/14/200913Example EBNF:A→X1…Xi(Y1…Yk)*Xi+1…XnA→X1…XiB Xi+1…XnB→ ε | Y1…YkBA→X1…Xi(Y1…Yk)+Xi+1…XnA→X1…XiB Xi+1…XnB→Y1…Yk| Y1…YkBA→X1…Xi(Y1…Yk)?Xi+1…XnA→X1…XiB Xi+1…XnB→ ε | Y1…Yk Args rule from previous example:Args → (Type ident (‘,’ Type ident)*)?6/14/200914Parsing From list of tokens, construct a syntax tree Simpler problem: Determine whether list of tokens is a sentence (“recognition”) Two types of parsers: top-down and bottom-up We will discuss recursive descent (top-down) and LR(1) (bottom-up) parsers Not all grammars can be parsed by any particular method Recursive descent is easier to use by hand LR(1) requires a generator LR(1) more powerful: can be applied to more grammars6/14/200915Top-down parsing by recursive descent Idea: Define a function parseA for each non-terminal A. Given token, decide which production from Ato apply, say A→X1, … , Xn. Go through X1, … , Xnin sequence, consuming tokens in X1, … , Xn, and recursively calling parsing function parseXifor non-terminals. Details: parseA : token list -> token list (almost) Each function will return a list of remaining tokens Error is reported if any of the Xiis a token that does not match the input token. Input is accepted if parse function returns empty list.6/14/200916parseA: actual typeparseA : token list -> (token list) optiontype ‘a option = None | Some ‘a6/14/200917Example 1 A → id | ‘(‘ A ‘)’ Define parseA : token list -> (token list) option ‘a option = None | Some ‘a parseA toklis matches first part of toklist and returns remainder of toklis, or None if syntax error.6/14/200918Example 1 (cont.) A → id | ‘(‘ A ‘)’type token = IDENT of string | LPAREN | RPARENlet rec parseA toklis = match toklis withIDENT x :: tls -> Some tls| LPAREN :: tls -> (match (parseA tls) withSome (h::tls') -> if h = RPAREN then Some tls‘else None| _ -> None)| _ -> None;;6/14/200919Example 2 A → id | ‘(‘ B ‘)’ B → int | Atype token = IDENT of string | LPAREN | RPAREN | INT of intlet rec parseA toklis = match toklis withIDENT x :: tls -> Some tls| LPAREN :: tls -> (match (parseB tls) withSome (h::tls') -> if h = RPAREN then Some tls'else None| _ -> None)| _ -> Noneand parseB toklis = match toklis withINT i :: tls -> Some tls| _ -> parseA toklis;;6/14/200920Example 3 Consider this grammar: A → id | ‘(‘ B ‘)’ B → A | A ‘+’ B Unfortunately, cannot parse using recursive descent This grammar, which has the same sentences: A → id | ‘(‘ B ‘)’ B → A C C → ‘+’ A C | εIsparsable by recursive descent6/14/200921Example 3 (cont.) Tree representation: ((x + y) + z)6/14/200922Example 3 (cont.)let rec parseA toklis = match toklis withIDENT x :: tls -> Some tls| LPAREN :: tls ->(match (parseB tls) withSome (h::tls’) -> if h = RPARENthen Some tls’else None| _ -> None)| _ -> Noneand parseB toklis = match parseA toklis withSome tls’ -> parseC tls’ | None -> Noneand parseC toklis = match toklis withPLUS :: tls’ -> (match parseA tls’ withSome tls’’ -> parseC tls’’| None -> None)| _ -> Some toklis;;6/14/200923Generating syntax trees – ex. 1b For simple grammar, A → id | ‘(‘ A ‘)’, define type for syntax trees:type cst = A1 of token * cst * token | A2 of token Parse function returns pair of remaining tokens and syntax tree created by this non-terminal:let rec parseA toklis = match toklis withIDENT x :: tls -> Some (tls, A2 (IDENT x))| LPAREN :: tls ->(match
View Full Document