CSc 453Compilers and Systems Software7 : Top-Down Parsing IIDepartment of Computer ScienceUniversity of [email protected]° 2009 Christian CollbergTop-Down ParsingThe parse tree is constructedFrom the topFrom left to right.The terminals are seen in order of appearance in the tokenstream (t2, t5, t6, t8, t9):1743t2t5t8t6t9Building The LL(1) Parser1Remove left recursion.2Left factor the grammar.3Construct transition diagrams for each production:ELSEIFExpr()THENStat() Stat()1Compute FIRST(A) for each grammar symbol A.2Compute FOLLOW(A) for each nonterminal A.4Simplify the transition diagrams.5Construct the recursive procedures.FIRST SetsFIRST(α) is the set of terminals that begin strings derivedfrom α.FIRST-sets help us solve problem 3.prog → decl | statstat→ if · ·· | id() | while · · ·decl → int id | real idFIRST(stat) = {if, id, while}FIRST(decl) = {int, real}FIRST Sets. . .PROCEDURE prog ();IF currtok ∈ {if, id, while} THEN stat();ELSIF currtok ∈ {int, real} THEN decl()ELSE syntax error ENDIF;END;PROCEDURE stat (); · · · END;PROCEDURE decl (); · · · END;FIRST(stat) = {if, id, while}FIRST(decl) = {int, real}FIRST Sets. . .FIRST(α) is the set of terminals that begin strings derivedfrom α, ie. FIRST(α) = {a| a a terminal, α∗⇒ aβ}.Example GrammarE → T E′E′→ +T E′| ǫT → F T′T′→ *F T′| ǫF → ( E ) | idFIRST sets:FIRST(E) = {(, id}FIRST(T ) = {(, id}FIRST(F ) = {(, id}FIRST(E′) = {+, ǫ}FIRST(T′) = {*, ǫ}Computing FIRST(A)REPEAT until no more changes:1.IF A is a terminal THEN FIRST(A) = {A}.2.IF A is a nonterminal, and thereis a production A → ǫ THEN ǫ is in FIRST(A).3.IF A is a nonterminal, and thereis a production A → Y1· · · YkTHENFOR i := 1 to k − 1 DOIF ǫ ∈ FIRST(Y1) ∧ · · · ∧ ǫ ∈ FIRST(Yi)and a (6= ǫ) ∈ FIRST(Yi +1) THEN a is in FIRST(A);IF ǫ ∈ FIRST(Y1) ∧ · · · ∧ ǫ ∈ FIRST(Yk) THENǫ is in FIRST(A);E → T E′E′→ +T E′| ǫT → F T′T′→ *F T′| ǫF → ( E ) | id1FIRST(+) = {+}, FIRST(*) = {*}, etc., because of point 1.2ǫ ∈ FIRST(E′), ǫ ∈ FIRST(T′), because of point 2.3(, id ∈ FIRST(F ), because of point 3.4(, id ∈ FIRST(T ), because (, id ∈ FIRST(F ), and T → F T′.5(, id ∈ FIRST(E ), because (, id ∈ FIRST(T ), and E → T E′.6+ ∈ FIRST(E′), * ∈ FIRST(T′), because E′→ + T E′andT′→ * F T′.FOLLOW SetsWe let $ symbolize end-of-input.FOLLOW(A) is the set of terminals that can follow right afterthe nonterminal A in some sentential form.FOLLOW(A) = {a| a a terminal, S∗⇒ αAaβ}.$ ∈ FOLLOW(A) if A is the rightmost symbol in a sententialform, i.e. S∗⇒ αA.FOLLOW Sets. . .E → T E′E′→ +T E′| ǫT → F T′T′→ *F T′| ǫF → ( E ) | idFOLLOW(E) = {), $}FOLLOW(T ) = {+, ), $}FOLLOW(F ) = {+, *, ), $}FOLLOW(F ) = {+, *, ), $}FOLLOW(E′) = {), $}}FOLLOW(T′) = {+, ), $}}FOLLOW Sets. . .E → T E′E′→ +T E′| ǫT → F T′T′→ *F T′| ǫF → ( E ) | id) is in FOLLOW(E ), becauseE ⇒ TE′⇒ FT′E′⇒ (E ) T′E′.+ is in FOLLOW(T ), becauseE ⇒ TE′⇒ T + TE′.FOLLOW Sets. . .E → T E′E′→ +T E′| ǫT → F T′T′→ *F T′| ǫF → ( E ) | id* is in FOLLOW(F ), becauseE ⇒ TE′⇒ FT′E′⇒ F * FT′E′.) is in FOLLOW(F ), becauseE ⇒ TE′⇒ FT′E′⇒ (E )T′E′⇒(TE′)T′E′⇒ (FT′E′)T′E′⇒(FE′)T′E′⇒ (F ) T′E′.Computing FOLLOW SetsLet S be the start symbol and $ the end-of-file marker.REPEAT until no more changes:1. Add $to FOLLOW(S).2. IF there is a production A → αBβ THENAdd everything in FIRST(β) (except ǫ) to FOLLOW(B).3. IF there is a production A → αB ORa production A → αBβ where ǫ ∈ FIRST(β) THENAdd everything in FOLLOW(A) to FOLLOW(B).Computing FOLLOW Sets. . .BBFIRST(β) ⊆ FOLLOW(B )FOLLOW(A) ⊆ FOLLOW(B)ββǫSAALL(1) GrammarsA grammar is LL(1) if we can construct a recursive descentparser that handles it (without using backtracking).LL(1) stands forThe input is scanned from L eft-to-right.The parse produces a L eftmost derivation.We have 1 -token lookahead.LL(1) Grammars. . .Formal Definition:A grammar is LL(1) iff for any two productionsA → α | βthe following conditions hold1FIRST(α) ∩ FIRST(β) = ∅2If β∗⇒ ǫ then FIRST(α) ∩ FOLLOW(A) = ∅Recursive Descent ParsersFOR each non-terminal A DOcreate initial and final states;FOR each production A → X1· · · XnDOcreate a path A’s initial to A’s final nodewith edges labeled X1· · · Xn:· · ·XnX2X1;Simplify the diagrams;FOR each transition diagram P DOCreate a procedure P that ‘‘traverses’’ the diagramguided by the input.Recursive Descent Parsers. . .non−terminalatokens2Brs1PROCEDURE P();IF currtok = a THEN curr tok := next token()ELSE syntaxerror; ENDIFIF currtok ∈ FIRST(s1) THEN code for parsing s1ELSIF currtok ∈ FIRST(s2) THEN code for parsing s2ELSE syntaxerror; ENDIFB();WHILE curr tok ∈ FIRST(r) DO code for parsing r ENDDOEND P;E → T E′E′→ +T E′| ǫT → F T′T′→ *F T′| ǫF → ( E ) | id*( )id+T E′F T′EE :E′:T :T′:F :ǫT E′F T′ǫ+( )id*E′:F T′T E′E :TǫǫEF :T′:ǫǫFT :( )id+*FF T′T E′E :EF :ǫTT :E′:T′:ǫ( )id+*FEF :ǫTǫFT :E :T( )id+*ǫEF :T :E :TFǫ+ǫE :TPROCEDURE E();LOOPT();IF curtok = + THENcur tok := next token()ELSE EXIT ENDIFENDLOOP;*ǫT :FPROCEDURE T();LOOPF();IF curtok = * THENcurtok := next token()ELSE EXIT ENDIFENDLOOP;( )idF :EPROCEDURE F();IF curtok = ( THENcurtok := next token();E();IF curtok = ) THENcurtok := next token()ELSE syntax error ENDIFELSIF cur tok = id THENcur tok := next token();ELSE syntax error ENDIFParsing ExpressionsHere’s the parser coded in Java.class ExprH {static String[] input = {"i","+","i","*","i","$"};static int current = 0;static boolean lookahead(String S) {return input[current].equals(S);}static void match(String S) throws Exception {if (input[current].equals(S)) current++;else throw new Exception();}static void E() throws Exception {T(); while(lookahead("+")) {match("+"); T();}}static void T() throws Exception {F(); while(lookahead("*")) {match("*"); F();}}static void F() throws Exception {if (lookahead("(")) {match("("); E(); match(")");} else match("i");}public static void main(String[] args) {try {E(); System.out.println("true");} catch (Exception e){System.out.println("false");}}Readings and ReferencesRead Louden, pp. 143–196.Or, the Dragon Book:Top-Down Parsing 181–190Error Recovery
View Full Document