Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Introduction to ParsingThe Front EndParser• Checks the stream of words and their parts of speech (produced by the scanner) for grammatical correctness• Determines if the input is syntactically well formed•Guides checking at deeper levels than syntax• Builds an IR representation of the codeThink of this as the mathematics of diagramming sentences SourcecodeScannerIRParserErrors tokensThe Study of ParsingThe process of discovering a derivation for some sentence• Need a mathematical model of syntax — a grammar G• Need an algorithm for testing membership in L(G) • Need to keep in mind that our goal is building parsers, not studying the mathematics of arbitrary languagesRoadmap1Context-free grammars and derivations2Top-down parsingHand-coded recursive descent parsers3Bottom-up parsingGenerated LR(1) parsersSpecifying Syntax with a GrammarContext-free syntax is specified with a context-free grammarSheepNoise SheepNoise baa | baaThis CFG defines the set of noises sheep normally make It is written in a variant of Backus–Naur formFormally, a grammar is a four tuple, G = (S,N,T,P)•S is the start symbol (set of strings in L(G))• N is a set of non-terminal symbols (syntactic variables)• T is a set of terminal symbols (words)• P is a set of productions or rewrite rules (P : N (N T)+ )Deriving SyntaxWe can use the SheepNoise grammar to create sentencesuse the productions as rewriting rulesAnd so on ...A More Useful GrammarTo explore the uses of CFGs,we need a more complex grammar• Such a sequence of rewrites is called a derivation• Process of discovering a derivation is called parsingWe denote this derivation: Expr * id – num * id1 E x p r E x p r O p E x p r 2 n u m b e r 3 i d 4 O p + 5 – 6 * 7 / R u l e S e n t e n t i a l F o r m — E x p r 1 E x p r O p E x p r 2 <i d , x> O p E x p r 5 <i d , x> – E x p r 1 <i d , x> – E x p r O p E x p r 2 <i d , x> – <n u m , 2> O p E x p r 6 <i d , x> – <n u m , 2> * E x p r 3 <i d , x> – <n u m , 2> * <i d , y>Derivations• At each step, we choose a non-terminal to replace• Different choices can lead to different derivationsTwo derivations are of interest• Leftmost derivation — replace leftmost NT at each step• Rightmost derivation — replace rightmost NT at each stepThese are the two systematic derivations(We don’t care about randomly-ordered derivations!)The example on the preceding slide was a leftmost derivation• Of course, there is also a rightmost derivation•Interestingly, it turns out to be differentThe Two Derivations for x – 2 * y In both cases, Expr * id – num * id• The two derivations produce different parse trees•The parse trees imply different evaluation orders! Leftmost derivationRightmost derivationR u l e S e n t e n t i a l F o r m — E x p r 1 E x p r O p E x p r 3 <i d , x> O p E x p r 5 <i d , x> – E x p r 1 <i d , x> – E x p r O p E x p r 2 <i d , x> – <n u m , 2> O p E x p r 6 <i d , x> – <n u m , 2> * E x p r 3 <i d , x> – <n u m , 2> * <i d , y> R u l e S e n t e n t i a l F o r m — E x p r 1 E x p r O p E x p r 3 E x p r O p <i d , y > 6 E x p r * <i d , y > 1 E x p r O p E x p r * <i d , y > 2 E x p r O p <n u m , 2 > * <i d , y > 5 E x p r – <n u m , 2 > * <i d , y > 3 <i d , x > – <n u m , 2 > * <i d , y >Derivations and Parse TreesLeftmost derivationGxEE Op–2EEEyOp*This evaluates as x – ( 2 * y )R u l e S e n t e n t i a l F o r m — E x p r 1 E x p r O p E x p r 3 <i d , x> O p E x p r 5 <i d , x> – E x p r 1 <i d , x> – E x p r O p E x p r 2 <i d , x> – <n u m , 2> O p E x p r 6 <i d , x> – <n u m , 2> * E x p r 3 <i d , x> – <n u m , 2> * <i d , y>Derivations and Parse TreesRightmost derivationx 2GEOp EEE Op Ey–*This evaluates as ( x – 2 ) * yR u l e S e n t e n t i a l F o r m — E x p r 1 E x p r O p E x p r 3 E x p r O p <i d , y > 6 E x p r * <i d , y > 1 E x p r O p E x p r * <i d , y > 2 E x p r O p <n u m , 2 > * <i d , y > 5 E x p r – <n u m , 2 > * <i d , y > 3 <i d , x > – <n u m , 2 > * <i d , y >Derivations and PrecedenceThese two derivations point out a problem with the grammar:It has no notion of precedence, or implied order of evaluationTo add precedence• Create a non-terminal for each level of precedence•Isolate the corresponding part of the grammar• Force the parser to recognize high precedence subexpressions firstFor algebraic expressions •Multiplication and division, first (level one)•Subtraction and addition, next (level two)Derivations and PrecedenceAdding the standard algebraic precedence produces:This grammar is slightly larger• Takes more rewriting to reach some of the terminal symbols• Encodes expected precedence• Produces same parse tree under leftmost & rightmost derivationsLet’s see how it parses x - 2 * yleveloneleveltwo1 G o a l E x p r 2 E x p r E x p r + T e r m 3 | E x p r …
View Full Document