1CMSC330Parsing2Last time• Context Free Grammars– Definitions– Derivations– Parse trees– Ambiguity– Binding and precedence3This lecture• Parsing– Recursive descent– FIRST sets• Rewriting grammars– Left factoring– Eliminating left recursion• Abstract syntax trees (ASTs)4Steps of compilation5Parsing• Many efficient techniques for parsing– i.e., turning strings into parse trees– Examples• LL(k), SLR(k), LR(k), LALR(k)…• Take CMSC 430 for more details• One simple technique: recursive descent parsing– This is a “top-down” parsing algorithm6Recursive Descent Parsing• Goal– Determine if we can produce the string to be parsed from the grammar's start symbol• Approach– Recursively replace nonterminal with RHS of production• At each step, we'll keep track of two facts– What tree node are we trying to match?– What is the lookahead (next token of the input string)? • Helps guide selection of production used to replace nonterminal7Recursive Descent Parsing (cont.)• At each step, 3 possible cases– If we’re trying to match a terminal• If the lookahead is that token, then succeed, advance the lookahead, and continue– If we’re trying to match a nonterminal• Pick which production to apply based on the lookahead– Otherwise fail with a parsing error8Parsing exampleE id = n | { L }L E ; L | – Here n is an integer and id is an identifier• One input might be– { x = 3; { y = 4; }; }– This would get turned into a list of tokens{ x = 3 ; { y = 4 ; } ; }– And we want to turn it into a parse tree9Parsing example (cont.)E id = n | { L }L E ; L | { x = 3 ; { y = 4 ; } ; }E{ L }E ; Lx = 3 E ; L{ L }E ;Ly = 4 lookahead10Recursive descent parsing • Key step– Choosing which production should be selected• Two approaches– Backtracking• Choose some production• If fails, try different production• Parse fails if all choices fail– Predictive parsing• Analyze grammar to find FIRST sets for productions• Compare with lookahead to decide which production to select• Parse fails if lookahead does not match FIRST11First Sets• Motivating example– The lookahead is x– Given grammar S xyz | abc• Select S xyz since 1st terminal in RHS matches x– Given grammar S A | B A x | y B z• Select S A, since A can derive string beginning with x• In general– Choose a production that can derive a sentential form beginning with the lookahead– Need to know what terminal may be first in any sentential form derived from a nonterminal / production12First sets• Definition– First( ), for any terminal or nonterminal , is the set of initial terminals of all strings that may expand to– We’ll use this to decide what production to apply• Examples– Given grammar S xyz | abc• First(xyz) = { x }, First(abc) = { a } • First(S) = First(xyz) U First(abc) = { x, a }– Given grammar S A | B A x | y B z• First(x) = { x }, First(y) = { y }, First(A) = { x, y }• First(z) = { z }, First(B) = { z }• First(S) = { x, y, z }13Calculating First( )• For a terminal a– First(a) = { a }• For a nonterminal N– If N , then add to First(N) – If N 12... n, then (note the iare all the symbols (terminal and non-terminal) on the right side of one single production):• Add First(12... n) to First(N), where First(12... n) is defined as– First(1) if ∉ First(1)– Otherwise (First(1) – ) U First(2... n)• If ∈ First(i) for all i, 1 ≤ i ≤ k, then add to First(N)14First() examplesE id = n | { L }L E ; L | First(id) = { id }First("=") = { "=" }First(n) = { n }First("{")= { "{" }First("}")= { "}" }First(";")= { ";" }First(E) = { id, "{" }First(L) = { id, "{", }E id = n | { L } | L E ; LFirst(id) = { id }First("=") = { "=" }First(n) = { n }First("{")= { "{" }First("}")= { "}" }First(";")= { ";" }First(E) = { id, "{", }First(L) = { id, "{", ";" }15Recursive Descent Parser Implementation• For terminals, create function match(a)– If lookahead is a it consumes the lookahead by advancing the lookahead to the next token, and returns– Otherwise fails with a parse error if lookahead is not a– In algorithm descriptions, consider parse_a, parse_term(a) to be aliases for match(a)• For each nonterminal N, create a function parse_N– Called when we’re trying to parse a part of the input which corresponds to (or can be derived from) N– parse_S for the start symbol S begins the parse16Parser Implementation (cont.)• The body of parse_N for a nonterminal N does the following– Let N 1| ... | kbe the productions of N• Here iis the entire right side of a production- a sequence ofterminals and nonterminals– Pick the production N isuch that the lookahead is in First(i)• It must be that First(i) First(j) = Ø for i j• If there is no such production, but N then return• Otherwise fail with a parse error– Suppose i= 12... n. Then call parse_1(); ... ; parse_n()to match the expected right-hand side, and return17Parser Implementation (cont.)• Parse is built on procedure calls• Procedures may be (mutually) recursive18Recursive Descent Parser• Given grammar S xyz | abc– First(xyz) = { x }, First(abc) = { a }• Parserparse_S( ) {if (lookahead == “x”) { match(“x”); match(“y”); match(“z”);// S xyz}else if (lookahead == “a”) {match(“a”); match(“b”); match(“c”);// S abc}else error( );}19Recursive Descent Parser• Given grammar S A | B A x | y B z– First(A) = { x, y }, First(B) = { z }• Parserparse_S( ) {if ((lookahead == “x”) || (lookahead == “y”)) parse_A( ); // S Aelse if (lookahead == “z”) parse_B( ); // S Belse error( );}20ExampleE id = n | { L }L E ; L | parse_E( ) {if (lookahead == “id”) {match(“id”); match(“=”); // E id = nmatch(“n”); }else if (lookahead == “{“) {match(“{“); parse_L( ); // E { L }match(“}”);}else error( );}parse_L( ) {if ((lookahead == “id”) ||(lookahead == “{”)) {parse_E( ); match(“;”);// L E ; Lparse_L( );}else ;// L }First(E) = { id, "{" }21Things to notice• If you draw the execution trace of the parser– You get the parse tree• Examples– Grammar S xyzS abc–
View Full Document