CSc 453Compilers and Systems Software6 : Top-Down Parsing IDepartment of Computer ScienceUniversity of [email protected]° 2009 Christian CollbergOverviewCompiler PhaseserrorsASTasmSemanticAnalyserInterm.Code GenIRMachineCode GenIRASTLexertokenssourceerrorsParsererrorsOptimizeWe are here!GrammarsContext Free GrammarsCFGs are used to describe the syntax of programminglanguages. AproductionS → if E then S1else S2in a CFG says“If S1and S2are statements and E an expressionthen ‘ifE then S1else S2’ is a statement”.Notice that this production isrecursive ; it allowsif-statements to occur within if-statements.Context Free Grammars. . .S → if E then S1else S2if, then, and else are terminal symbols or tokens.S, S1, S2, and E are non-terminals . They are like “variables”,that represent the kinds of strings that the grammar definesasstatements or expressions, respectively.CFG Notationterminals:a, b, c, · · · , +, -, · · · , 0, 1, · · · , if, do.nonterminals:A, B, C, · · · , S, · · · , expr, stmt.grammar symbols:X , Y , Z, · · · (either terminals or nonterminals).strings of terminals:u, v , w · · · .strings of grammar symbols:α, β, γ, · · · (strings of terminals or nonterminals).productions:A → α1, A → α2, · · · , A → αk, orA → α1| α2· · · | αk.Derivations — Productions as Rewrite Rules1Start with the start symbol , S .2Pick any production S → α, eg. S → id := E .3We say that S derives id := E , or S ⇒ id := E . ‘id := E ’ isasentential form derived from S .4Repeat: pick a nonterminal A from the sentential form,replace with the RHS of a production A → α:S ⇒ id:= E ⇒ id := E +E ⇒id:= id + E ⇒ id := id+num. S∗⇒ id := id+num.S → id := E | if E then SE → E +E | id | numTerminologyA grammar is a 4-tuple(non-terminals, terminals, productions, start-symbol)or(N, Σ, P, S)A production is of the form α → β where α, β are taken fromNSΣ.Read α → β as “rewrite α with β”.Read ⇒ as “directly derives”.Readr⇒ as “directly derives using rule r”.Read∗⇒ as “derives in zero or more steps”.Derivations. . .αAβ ⇒ αγβ ifA → γ is a production, andα and β are strings of grammar symbols.⇒: Derives in one step.∗⇒:Derives in 0 or more steps.+⇒:Derives in 1 or more steps.⇒lm :Leftmost derivation.⇒rm:Rightmost derivation.L(G ): The language generated by grammar G. This is theset of strings w, such that there is a derivationS+⇒ w, where S is G ’s start-symbol.Derivations. . .The string of terminal symbols id:=id+num is generated by aleftmost derivation:S⇒lm id:= E⇒lm id := E +E⇒lm id:= id+E⇒lm id := id+numS+⇒ id:=id+numExample Grammar:S → id := E | if E then SE → E +E | id | numParse Trees. . .If one step of our derivation is· · · A · · · ⇒ · · · X Y Z · · ·(i.e, we used the rule A → XYZ) then we’ll get a parse(sub-)tree............AXYZProgram ⇒ BEGIN Stat END⇒ BEGIN ident := Expr END⇒ BEGIN "a" := Expr END⇒ BEGIN "a" := Expr + Expr END⇒ BEGIN "a" := 5 + Expr END⇒ BEGIN "a" := 5 + Expr * Expr END⇒ BEGIN "a" := 5 + 4 * Expr END⇒ BEGIN "a" := 5 + 4 * 3 END*identExprExprExprExpr ExprStat453ENDBEGINProgram:="a"+Top-Down ParsingTop-Down Backtracking ParserTop-down parsing involves building a parse tree for the inputstring by starting at the root and adding nodes in preorder.S → cAd A → ab|adacdSA dcSA dcacdSbaacTop-Down Backtracking Parser. . .If a backtracking top-down parser chooses the wrongproduction rule to expand a node it backs up over the input,and undoes some of the parse tree construction:acdSA dcaacdSA dcaacdSA dcabaacdSA dcoops!Grammar RewritingAmbiguous GrammarsA grammar is ambiguous if some string of tokens can producetwo (or more) different parse trees.E ::= E + E | E * E | number5 + 4 * 3E ⇒ E + E⇒ 5 +E⇒ 5 + E * E⇒ 5 + 4 * E⇒ 5 + 4 * 3+EE E34*E5EE ⇒ E * E⇒ E *3⇒ E + E * 3⇒ E + 4 * 3⇒ 5 + 4 * 3EEE E+5 4E3*Operator PrecedenceThe precedence of an operator is a measure of itsbinding power, i.e. how strongly it attracts its operands.Usually ∗ has higher precedence than +:4 + 5 ∗ 3means4 + (5 ∗ 3),not(4 + 5) ∗ 3.We say that ∗ binds harder than +.Operator AssociativityThe associativity of an operator describes how operators ofequal precedence are grouped.+ and − are usually left associative :4 − 2 + 3means(4 − 2) + 3 = 5,not4 − (2 + 3) = −1.We say that +associates to the left.^ associates to the right:2^3^4 = 2^(3^4).Expression GrammarsWe must write unambiguous expression grammars that reflectthe associativity and precedence of all operators.The next slide gives the algorithm for writing such grammars.Resulting Expression Grammar:expr ::= expr + term | termterm::= term * factor | factorfactor ::= ( expr ) | number1Create one non-terminal for each precedence level, for examplep1, p2, · · · , pn, where pnhas the highest precedence level.2For operator op at precedence level i construct the followingproduction if the operator is• left associative:pi::= piop pi +1| pi +1• right associative:pi::= pi +1op pi| pi +13Construct a production for nonterminal pn+1which representsprimary expressions such as identifiers, numbers,parenthesized expressions, etc:pn+1::= (p1) | num | idE ::= E + T | TT::= T * F | FF::= number5 + 4 * 3E ⇒ E + T⇒ T + T⇒ F + T⇒ 5 + T⇒ 5 + T * F⇒ 5 + F * F⇒ 5 + 4 * F⇒ 5 + 4 * 3E ⇒ E + T⇒ E + T * F⇒ E + T * 3⇒ E + F * 3⇒ E + 4 * 3⇒ T + 4 * 3⇒ F + 4 * 3⇒ 5 + 4 * 3+ETF5*TF4F3ETTop-Down ParsingRecursive Descent ParsingPROCEDURE S ();IF currtok = if THENmatch(if); E();match(then); S();ELSIF currtok = id THENmatch(id); match(:=); E();ELSE syntax error ENDIF;PROCEDURE E ();IF curr tok = id THEN match(id);ELSE IF curr tok = num THEN match(num);ELSE E(); match(+); E();ENDIF;S → id := E| ifE then SE → E +E| id | numRecursive Descent—Small Problem 1We may loop forever:PROCEDURE E ();IF · · ·ELSE E(); match(+); E();· · ·Recursive Descent—Small Problem 2What about productions that start out similarly:S → if E then S |ifE then S else SPROCEDURE S ();IF currtok = if THENmatch(if); E(); match(then); S();ELSIF currtok = if THENmatch(if); E(); match(then);S(); match(else); S();ELSIF · · · ENDIFRecursive Descent—Small Problem 3What if there are several possible “next” tokens:prog → decl | statstat→ if · · · | id() | while · · ·decl → int id | real idPROCEDURE prog ();IF currtok ∈
View Full Document