CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 1Chapter 4dChapter 4dBottom UpParsingCMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 2Right Sentential Forms• Recall the definition of a derivation and a rightmost derivation.• Each of the lines is a (right) sentential form• The parsing problem is finding the correct RHS in a right-sentential form to reduce to get the previous right-sentential form in the derivation1 E -> E+T2 E -> T3 T -> T*F4 E -> F5 F -> (E)6 F -> idEE+TE+T*FE+T*idE+F*idE+id*idT+id*idF+id*idid+id*idgenerationparsingCMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 3Bottom up parsing• A bottom up parser looks at a sentential form and selects a contiguous sequence of symbols that matches the RHS of a grammar rule, and replaces it with the LHS• There might be several choices, as in the sentential form E+T*F• Which one should we choose?EE+TE+T*FE+T*idE+F*idE+id*idT+id*idF+id*idid+id*id1 E -> E+T2 E -> T3 T -> T*F4 E -> F5 F -> (E)6 F -> idE + T * F1 32 4parsingCMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 4Bottom up parsing• If the wrong one is chosen, it leads to failure.• E.g.: replacing E+T with E in E+T*F yields E+F, which can not be further reduced using the given grammar.•We’ll define the handle of a sentential form as the RHS that should be rewritten to yield the next sentential form in the right most derivation.errorE*FE+T*FE+T*idE+F*idE+id*idT+id*idF+id*idid+id*id1 E -> E+T2 E -> T3 T -> T*F4 E -> F5 F -> (E)6 F -> idparsingCMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 5Sentential forms• Think of a sentential form as one of the entries in a derivation that begins with the start symbol and ends with a legal sentence.• So, it’s like a sentence but it may have some “unexpanded” non-terminals.• We can also think of it as a parse tree where some of the leaves are as yet unexpanded non-terminals.1 E -> E+T2 E -> T3 T -> T*F4 E -> F5 F -> (E)6 F -> idEE+TE+T*FE+T*idE+F*idE+id*idT+id*idF+id*idid+id*idgenerationparsingE+T*idFTECMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 6Handles• A handle of a sentential form is a substring α such that :– a matches the RHS of a production A -> α ; and– replacing α by the LHS A represents a step in thereverse of a rightmost derivation of s.• For this grammar, the rightmostderivation for the input abbcde isS => aABe => aAde => aAbcde => abbcde• The string aAbcde can be reduced in two ways:(1) aAbcde => aAde (using rule 2)(2) aAbcde => aAbcBe (using rule 4)• But (2) isn’t a rightmost derivation, so Abc is the only handle.• Note: the string to the right of a handle will only contain terminals (why?)1: S -> aABe2: A -> Abc 3: A -> b4: B -> da A b c d eCMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 7Phrases•A phrase is a subsequence of a sentential form that is eventually “reduced” to a single non-terminal.•A simple phrase is a phrase that is reduced in a single step.• The handle is the left-most simple phrase.E+T*idFTEFor this sentential form what are the• phrases• simple phrases• handleCMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 8Phrases, simple phrases and handles• Def: β is the handle of the right sentential form γ = αβw if and only if S =>*rm αAw => αβw• Def: β is a phrase of the right sentential form γ if and only if S =>* γ = α1Aα2 =>+ α1βα2• Def: β is a simple phrase of the right sentential form γif and only if S =>* γ = α1Aα2 => α1βα2• The handle of a right sentential form is its leftmost simple phrase• Given a parse tree, it is now easy to find the handle• Parsing can be thought of as handle pruningCMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 9Phrases, simple phrases and handlesEE+TE+T*FE+T*idE+F*idE+id*idT+id*idF+id*idid+id*idE -> E+TE -> TT -> T*FE -> FF -> (E)F -> idCMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 10On to parsing• Ok, so how do we manage when we don’t have the parse tree in front of us?• We’ll look at a shift-reduce parser, of the kind that yacc uses.• A shift-reduce parser has a queue of input tokens and an initially empty stack and takes one of four possible actions:– Accept: if the input queue is empty and the start symbol is the only thing on the stack.– Reduce: if there is a handle on the top of the stack, pop it off and replace it with the RHS– Shift: push the next input token onto the stack– Fail: if the input is empty and we can’t accept.• In general, we might have a choice of doing a shift or a reduce,or maybe in reducing using one of several rules.• The algorithm we next describe is deterministic.CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 11Shift-Reduce AlgorithmsA shift-reduce parser scans input, at each step, considers whether to:• Shift the next token to the top of the parse stack (along with some state info)• Reduce the stack by POPing several symbols off the stack (& their state info) and PUSHing the corresponding nonterminal (& state info)CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 12Shift-Reduce AlgorithmsThe stack is always of the formS1 X1 S2 X2 …Sn Xnbottomtopstate terminal ornon-terminal• A reduction step is triggered when we see the symbols corresponding to a rule’s RHS on the top of the stackS1 X1 …S5 X5 S6 T S7 * S8 FbottomtopT -> T*FS1 X1 …S5 X5 S6’ TCMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 13LR parser table• LR shift-reduce parsers can be efficiently implemented by precomputing a table to guide the processingMore on this Later . . . CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 14When to shift, when to reduce• The key problem in building a shift-reduce parser is deciding whether to shift or to reduce.– repeat: reduce if you see a handle on the top of the stack, shift otherwise– Succeed if we stop with only S on the stack and no input• A grammar may not be appropriate for a LR parser because there are conflicts which can not be resolved.• A conflict occurs when the parser cannot decide whether to:– shift or reduce the top of stack (a shift/reduce conflict), or – reduce the top of stack using one of two possible productions (a reduce/reduce conflict)• There are several varieties of LR parsers (LR(0), LR(1), SLR and LALR), with differences
View Full Document