Unformatted text preview:

CSc 453Compilers and Systems Software6 : Top-Down Parsing IDepartment of Computer ScienceUniversity of [email protected]° 2009 Christian CollbergOverviewCompiler PhaseserrorsASTasmSemanticAnalyserInterm.Code GenIRMachineCode GenIRASTLexertokenssourceerrorsParsererrorsOptimizeWe are here!GrammarsContext Free GrammarsCFGs are used to describe the syntax of programminglanguages. AproductionS → if E then S1else S2in a CFG says“If S1and S2are statements and E an expressionthen ‘ifE then S1else S2’ is a statement”.Notice that this production isrecursive ; it allowsif-statements to occur within if-statements.Context Free Grammars. . .S → if E then S1else S2if, then, and else are terminal symbols or tokens.S, S1, S2, and E are non-terminals . They are like “variables”,that represent the kinds of strings that the grammar definesasstatements or expressions, respectively.CFG Notationterminals:a, b, c, · · · , +, -, · · · , 0, 1, · · · , if, do.nonterminals:A, B, C, · · · , S, · · · , expr, stmt.grammar symbols:X , Y , Z, · · · (either terminals or nonterminals).strings of terminals:u, v , w · · · .strings of grammar symbols:α, β, γ, · · · (strings of terminals or nonterminals).productions:A → α1, A → α2, · · · , A → αk, orA → α1| α2· · · | αk.Derivations — Productions as Rewrite Rules1Start with the start symbol , S .2Pick any production S → α, eg. S → id := E .3We say that S derives id := E , or S ⇒ id := E . ‘id := E ’ isasentential form derived from S .4Repeat: pick a nonterminal A from the sentential form,replace with the RHS of a production A → α:S ⇒ id:= E ⇒ id := E +E ⇒id:= id + E ⇒ id := id+num. S∗⇒ id := id+num.S → id := E | if E then SE → E +E | id | numTerminologyA grammar is a 4-tuple(non-terminals, terminals, productions, start-symbol)or(N, Σ, P, S)A production is of the form α → β where α, β are taken fromNSΣ.Read α → β as “rewrite α with β”.Read ⇒ as “directly derives”.Readr⇒ as “directly derives using rule r”.Read∗⇒ as “derives in zero or more steps”.Derivations. . .αAβ ⇒ αγβ ifA → γ is a production, andα and β are strings of grammar symbols.⇒: Derives in one step.∗⇒:Derives in 0 or more steps.+⇒:Derives in 1 or more steps.⇒lm :Leftmost derivation.⇒rm:Rightmost derivation.L(G ): The language generated by grammar G. This is theset of strings w, such that there is a derivationS+⇒ w, where S is G ’s start-symbol.Derivations. . .The string of terminal symbols id:=id+num is generated by aleftmost derivation:S⇒lm id:= E⇒lm id := E +E⇒lm id:= id+E⇒lm id := id+numS+⇒ id:=id+numExample Grammar:S → id := E | if E then SE → E +E | id | numParse Trees. . .If one step of our derivation is· · · A · · · ⇒ · · · X Y Z · · ·(i.e, we used the rule A → XYZ) then we’ll get a parse(sub-)tree............AXYZProgram ⇒ BEGIN Stat END⇒ BEGIN ident := Expr END⇒ BEGIN "a" := Expr END⇒ BEGIN "a" := Expr + Expr END⇒ BEGIN "a" := 5 + Expr END⇒ BEGIN "a" := 5 + Expr * Expr END⇒ BEGIN "a" := 5 + 4 * Expr END⇒ BEGIN "a" := 5 + 4 * 3 END*identExprExprExprExpr ExprStat453ENDBEGINProgram:="a"+Top-Down ParsingTop-Down Backtracking ParserTop-down parsing involves building a parse tree for the inputstring by starting at the root and adding nodes in preorder.S → cAd A → ab|adacdSA dcSA dcacdSbaacTop-Down Backtracking Parser. . .If a backtracking top-down parser chooses the wrongproduction rule to expand a node it backs up over the input,and undoes some of the parse tree construction:acdSA dcaacdSA dcaacdSA dcabaacdSA dcoops!Grammar RewritingAmbiguous GrammarsA grammar is ambiguous if some string of tokens can producetwo (or more) different parse trees.E ::= E + E | E * E | number5 + 4 * 3E ⇒ E + E⇒ 5 +E⇒ 5 + E * E⇒ 5 + 4 * E⇒ 5 + 4 * 3+EE E34*E5EE ⇒ E * E⇒ E *3⇒ E + E * 3⇒ E + 4 * 3⇒ 5 + 4 * 3EEE E+5 4E3*Operator PrecedenceThe precedence of an operator is a measure of itsbinding power, i.e. how strongly it attracts its operands.Usually ∗ has higher precedence than +:4 + 5 ∗ 3means4 + (5 ∗ 3),not(4 + 5) ∗ 3.We say that ∗ binds harder than +.Operator AssociativityThe associativity of an operator describes how operators ofequal precedence are grouped.+ and − are usually left associative :4 − 2 + 3means(4 − 2) + 3 = 5,not4 − (2 + 3) = −1.We say that +associates to the left.^ associates to the right:2^3^4 = 2^(3^4).Expression GrammarsWe must write unambiguous expression grammars that reflectthe associativity and precedence of all operators.The next slide gives the algorithm for writing such grammars.Resulting Expression Grammar:expr ::= expr + term | termterm::= term * factor | factorfactor ::= ( expr ) | number1Create one non-terminal for each precedence level, for examplep1, p2, · · · , pn, where pnhas the highest precedence level.2For operator op at precedence level i construct the followingproduction if the operator is• left associative:pi::= piop pi +1| pi +1• right associative:pi::= pi +1op pi| pi +13Construct a production for nonterminal pn+1which representsprimary expressions such as identifiers, numbers,parenthesized expressions, etc:pn+1::= (p1) | num | idE ::= E + T | TT::= T * F | FF::= number5 + 4 * 3E ⇒ E + T⇒ T + T⇒ F + T⇒ 5 + T⇒ 5 + T * F⇒ 5 + F * F⇒ 5 + 4 * F⇒ 5 + 4 * 3E ⇒ E + T⇒ E + T * F⇒ E + T * 3⇒ E + F * 3⇒ E + 4 * 3⇒ T + 4 * 3⇒ F + 4 * 3⇒ 5 + 4 * 3+ETF5*TF4F3ETTop-Down ParsingRecursive Descent ParsingPROCEDURE S ();IF currtok = if THENmatch(if); E();match(then); S();ELSIF currtok = id THENmatch(id); match(:=); E();ELSE syntax error ENDIF;PROCEDURE E ();IF curr tok = id THEN match(id);ELSE IF curr tok = num THEN match(num);ELSE E(); match(+); E();ENDIF;S → id := E| ifE then SE → E +E| id | numRecursive Descent—Small Problem 1We may loop forever:PROCEDURE E ();IF · · ·ELSE E(); match(+); E();· · ·Recursive Descent—Small Problem 2What about productions that start out similarly:S → if E then S |ifE then S else SPROCEDURE S ();IF currtok = if THENmatch(if); E(); match(then); S();ELSIF currtok = if THENmatch(if); E(); match(then);S(); match(else); S();ELSIF · · · ENDIFRecursive Descent—Small Problem 3What if there are several possible “next” tokens:prog → decl | statstat→ if · · · | id() | while · · ·decl → int id | real idPROCEDURE prog ();IF currtok ∈


View Full Document

UA CSC 453 - Study Notes

Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?