1CMSC 330: Organization of Programming LanguagesContext-Free Grammars CMSC 330 2Review• Why should we study CFGs?• What are the four parts of a CFG?• How do we tell if a string is accepted by a CFG?• What’s a parse tree?2CMSC 330 3ReviewA sentential form is a string of terminals and non-terminals produced from the start symbolInductively:– The start symbol–If A is a sentential form for a grammar, where (and (N|)*), and A is a production, then is a sentential form for the grammar• In this case, we say that A derives in one step, which is written as A CMSC 330 4Leftmost and Rightmost Derivation•Example: S a | SbS String: abaLeftmost Derivation Rightmost DerivationS SbS abS aba S SbS Sba abaAt every step, apply production At every step, apply productionto leftmost non-terminal to rightmost non-terminal• Both derivations happen to have the same parse tree• A parse tree has a unique leftmost and a unique rightmost derivation• Not every string has a unique parse tree• Parse trees don’t show the order productions are applied3CMSC 330 5Another Example (cont’d)S a | SbS•Is ababa in this language?A leftmost derivationS SbS abS abSbS ababS ababaAnother leftmost derivationS SbS SbSbS abSbS ababS ababaCMSC 330 6Ambiguity•A string is ambiguous for a grammar if it has more than one parse tree– Equivalent to more than one leftmost (or more than one rightmost) derivation• A grammar is ambiguous if it generates an ambiguous string– It’s can be hard to see this with manual inspection• Exercise: can you create an unambiguous grammar for S a | SbS ?4CMSC 330 7Are these Grammars Ambiguous?(1) S aS | TT bT | UU cU | (2) S T | TT Tx | Tx | x | x(3) S SS | () | (S)CMSC 330 8Ambiguity of Grammar (Example 3)•2 different parse trees for the same string: ()()()• 2 distinct leftmost derivations :S ⇒ SS ⇒ SSS ⇒()SS ⇒()()S ⇒()()()S ⇒ SS ⇒ ()S ⇒()SS ⇒()()S ⇒()()()• We need unambiguous grammars to manage programming language semantics5CMSC 330 9More on Leftmost/Rightmost Derivations• Is the following derivation leftmost or rightmost?S aS aT aU acU ac– There’s at most one non-terminal in each sentential form, so there's no choice between left or right non-terminals to expand• How about the following derivation?–S SbS SbSbS SbabS ababS ababaCMSC 330 10Tips for Designing Grammars1. Use recursive productions to generate an arbitrary number of symbolsA xA | Zero or more x’sA yA | y One or more y’s2. Use separate non-terminals to generate disjoint parts of a language, and then combine in a productionG = S ABA aA | B bB | L(G) = a*b*6CMSC 330 11Tips for Designing Grammars (cont’d)3. To generate languages with matching, balanced, or related numbers of symbols, write productions which generate strings from the middle{anbn| n 0} (not a regular language!)S aSb | Example: S aSb aaSbb aabb{anb2n| n 0}S aSbb | CMSC 330 12Tips for Designing Grammars (cont’d){anbm| m 2n, n 0}S aSbb | B | B bB | bThe following grammar also works:S aSbb | BB bB | How about the following?S aSbb | bS | 7CMSC 330 13Tips for Designing Grammars (cont’d){anbman+m| n 0, m 0}Rewrite as anbmaman, which now has matching superscripts (two pairs)Would this grammar work?S aSa | BB bBa | baCorrected:S aSa | BB bBa | The outer ananare generated first,then the inner bmamDoesn’t allow m = 0CMSC 330 14Tips for Designing Grammars (cont’d)4. For a language that’s the union of other languages, use separate nonterminals for each part of the union and then combine{ an(bm|cm) | m > n 0}Can be rewritten as{ anbm| m > n 0} { ancm| m > n 0}8CMSC 330 15Tips for Designing Grammars (cont’d){ anbm| m > n 0} { ancm| m > n 0}S T | UT aTb | Tb | b T generates the first setU aUc | Uc | c U generates the second set• What’s the parse tree forstring abbb?• Ambiguous!CMSC 330 16Tips for Designing Grammars (cont’d){ anbm| m > n 0} { ancm| m > n 0}Will this fix the ambiguity?S T | UT aTb | bT | bU aUc | cU | c• It's not amgiguous, but it can generate invalid strings such as babb9CMSC 330 17Tips for Designing Grammars (cont’d){ anbm| m > n 0} { ancm| m > n 0}Unambiguous versionS T | VT aTb | UU Ub | bV aVc | WW Wc | cCMSC 330 18CFGs for Languages• Recall that our goal is to describe programming languages with CFGs• We had the following example which describes limited arithmetic expressionsE a | b | c | E+E | E-E | E*E | (E)• What’s wrong with using this grammar?– It’s ambiguous!10CMSC 330 19Example: a-b-cE E-E a-E a-E-E a-b-E a-b-cE E-E E-E-E a-E-E a-b-E a-b-cCorresponds to a-(b-c)Corresponds to (a-b)-cCMSC 330 20The Issue: Associativity• Ambiguity is bad here because if the compiler needs to generate code for this expression, it doesn’t know what the programmer intended• So what do we mean when we write a-b-c?– In mathematics, this only has one possible meaning– It’s (a-b)-c, since subtraction is left-associative– a-(b-c) would be the meaning if subtraction was right-associative11CMSC 330 21Another Example: If-Then-Else<stmt> ::= <assignment> | <if-stmt> | ...<if-stmt> ::= if (<expr>) <stmt> |if (<expr>) <stmt> else <stmt>– (Here <>’s are used to denote nonterminals and ::= for productions)• Consider the following program fragment:if (x > y)if (x < z)a = 1;else a = 2;– Note: Ignore newlinesCMSC 330 22Parse Tree #1•Elsebelongs to inner if12CMSC 330 23Parse Tree •Elsebelongs to outer ifCMSC 330 24Fixing the Expression Grammar• Idea: Require that the right operand of all of the operators is not a bare expressionE E+T | E-T | E*T | TT a | b | c | (E)• Now there's only one parse tree for a-b-c– Exercise: Give a derivationfor the string a-(b-c)13CMSC 330 25What if We Wanted Right-Associativity?• Left-recursive productions are used for left-associative operators• Right-recursive productions are used for right-associative operators• Left:E E+T | E-T | E*T | TT a | b | c | (E)• Right:E T+E | T-E | T*E | TT a | b | c | (E)CMSC 330
View Full Document