COS 320 CompilersThe Front EndParsing with CFGsContext-Free GrammarsSlide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Parse TreesSlide 17Slide 18Slide 19Ambiguous GrammarsSlide 21Slide 22Slide 23Building ParsersRecursive Descent ParsingSlide 26Slide 27Slide 28Slide 29Slide 30problemsolutioneliminating left-recursion in generalSlide 34Constructing RD ParsersSlide 36Constructing Predictive ParsersSlide 38Slide 39Slide 40Constricting Predictive ParsersIterative AnalysisSlide 43Computing Nullable SetsComputing First SetsComputing Follow Setsbuilding a predictive parserSlide 48Slide 49Slide 50Slide 51Slide 52Slide 53Slide 54Slide 55Slide 56Slide 57Slide 58Slide 59Slide 60Slide 61Slide 62Slide 63Slide 64Slide 65predictive parsing tablesanother trickSlide 68summaryCOS 320CompilersDavid WalkerThe Front End•Lexical Analysis: Create sequence of tokens from characters (Chap 2)•Syntax Analysis: Create abstract syntax tree from sequence of tokens (Chap 3)•Type Checking: Check program for well-formedness constraintsLexer Parserstream ofcharactersstream oftokensabstractsyntaxTypeCheckerParsing with CFGs•Context-free grammars are (often) given by BNF expressions (Backus-Naur Form)–Appel Chap 3.1•More powerful than regular expressions–Matching parens–Nested comments•wait, we could do nested comments with ML-LEX!•CFGs are good for describing the overall syntactic structure of programs.Context-Free Grammars•Context-free grammars consist of:–Set of symbols:•terminals that denotes token types•non-terminals that denotes a set of strings–Start symbol–Rules:•left-hand side: non-terminal•right-hand side: terminals and/or non-terminals•rules explain how to rewrite non-terminals (beginning with start symbol) into terminalssymbol ::= symbol symbol ... symbolContext-Free GrammarsA string is in the language of the CFG if only if it is possible to derive that string using the following non-deterministic procedure:1. begin with the start symbol2. while any non-terminals exist, pick a non-terminal and rewrite it using a rule3. stop when all you have left are terminals (and check you arrived at the string your were hoping to)Parsing is the process of checking that a string is in the CFG for your programming language. It is usually coupled with creating an abstract syntax tree.non-terminals: S, E, Elistterminals: ID, NUM, PRINT, +, :=, (, ), ;rules:S ::= S; SS ::= ID := ES ::= PRINT ( Elist ) E ::= IDE ::= NUME ::= E + EE ::= ( S , Elist ) Elist ::= EElist ::= Elist , Enon-terminals: S, E, Elistterminals: ID, NUM, PRINT, +, :=, (, ), ;rules:1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist ) 4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist ) 8. Elist ::= E9. Elist ::= Elist , EID = NUM ; PRINT ( NUM )Derive me!non-terminals: S, E, Elistterminals: ID, NUM, PRINT, +, :=, (, ), ;rules:1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist ) 4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist ) 8. Elist ::= E9. Elist ::= Elist , ES ID = NUM ; PRINT ( NUM )Derive me!non-terminals: S, E, Elistterminals: ID, NUM, PRINT, +, :=, (, ), ;rules:1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist ) 4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist ) 8. Elist ::= E9. Elist ::= Elist , ESID = E ID = NUM ; PRINT ( NUM )Derive me!non-terminals: S, E, Elistterminals: ID, NUM, PRINT, +, :=, (, ), ;rules:1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist ) 4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist ) 8. Elist ::= E9. Elist ::= Elist , ESID = E ID = NUM ; PRINT ( NUM )Derive me!oops,can’t makeprogressnon-terminals: S, E, Elistterminals: ID, NUM, PRINT, +, :=, (, ), ;rules:1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist ) 4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist ) 8. Elist ::= E9. Elist ::= Elist , ESID = NUM ; PRINT ( NUM )Derive me!non-terminals: S, E, Elistterminals: ID, NUM, PRINT, +, :=, (, ), ;rules:1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist ) 4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist ) 8. Elist ::= E9. Elist ::= Elist , ESS ; SID = NUM ; PRINT ( NUM )Derive me!non-terminals: S, E, Elistterminals: ID, NUM, PRINT, +, :=, (, ), ;rules:1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist ) 4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist ) 8. Elist ::= E9. Elist ::= Elist , ESS ; SID := E ; SID = NUM ; PRINT ( NUM )Derive me!non-terminals: S, E, Elistterminals: ID, NUM, PRINT, +, :=, (, ), ;rules:1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist ) 4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist ) 8. Elist ::= E9. Elist ::= Elist , ESS ; SID = E ; SID = NUM ; SID = NUM ; PRINT ( Elist )ID = NUM ; PRINT ( E )ID = NUM ; PRINT ( NUM )Derive me!non-terminals: S, E, Elistterminals: ID, NUM, PRINT, +, :=, (, ), ;rules:1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist ) 4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist ) 8. Elist ::= E9. Elist ::= Elist , ESS ; SID = E ; SID = NUM ; SID = NUM ; PRINT ( Elist )ID = NUM ; PRINT ( E )ID = NUM ; PRINT ( NUM )Another way toderive thesame stringSS ; SS ; PRINT ( Elist )S ; PRINT ( E )S ; PRINT ( NUM )ID = E ; PRINT ( NUM )ID = NUM ; PRINT ( NUM )left-most derivation right-most derivationParse Trees•Representing derivations as trees–useful in compilers: Parse trees correspond quite closely (but not exactly) with abstract syntax trees we’re trying to generate•difference: abstract syntax vs concrete (parse) syntax•each internal node is labeled with a non-terminal•each leaf note is labeled with a terminal•each use of a rule in a derivation explains how to generate children in the parse tree from the parentsParse Trees•Example:SS SIDE:=NUMNUMEL )(PRINT;SS ; SID = E ; SID = NUM ; SID = NUM ; PRINT ( Elist )ID = NUM ; PRINT ( E )ID = NUM ; PRINT ( NUM )Parse Trees•Example: 2 derivations, but 1 treeSS SIDE:=NUMNUMEL )(PRINT;SS ; SID = E ; SID = NUM ; SID = NUM ; PRINT ( Elist )ID = NUM ; PRINT ( E )ID = NUM ; PRINT ( NUM )SS ; SS ; PRINT ( Elist )S ; PRINT ( E )S ; PRINT ( NUM )ID = E ; PRINT ( NUM )ID = NUM ; PRINT ( NUM )Parse Trees•parse trees have meaning.–order of children, nesting of subtrees is significantSSSIDE:=NUMNUMEL )(PRINT;SSSIDE:=NUMNUMEL )(PRINT;Ambiguous Grammars•a grammar is ambiguous if the same sequence of tokens can give rise to two or more parse treesAmbiguous
View Full Document