Unformatted text preview:

Outline Ambiguity revisited Ambiguity and Errors Syntax Directed Translation Extensions of CFG for parsing Precedence declarations Error handling Semantic actions Constructing a parse tree CS780 Prasad L7AMBAST 1 CS780 Prasad L7AMBAST 2 Ambiguity Revisited The Dangling Else Example Ambiguity common in programming languages Arithmetic expressions IF THEN ELSE The Dangling Else Problem The expression if E1 then if E2 then E3 else E4 has two parse trees if Consider the grammar E if E then E if E then E else E OTHER This grammar is ambiguous CS780 Prasad L7AMBAST E1 if E2 if E1 E4 if E3 E2 E3 E4 Typically we want the second form 3 CS780 Prasad L7AMBAST 4 1 The Dangling Else A Fix The Dangling Else Revisited else matches the closest unmatched then We can describe this in the grammar for the same language as follows The expression if E1 then if E2 then E3 else E4 E MIF UIF E2 5 Handling Ambiguity Impossible to convert automatically an ambiguous grammar to an unambiguous one CS780 Prasad E3 Not valid because the then expression is not a MIF L7AMBAST 6 Consider the grammar E E E int Ambiguous two parse trees of int int int Ambiguity checking is undecidable Inherently ambiguous context free languages exist E E Instead of rewriting the grammar use the more natural ambiguous grammar along with disambiguating declarations E int Most parser generator tools allow precedence and associativity declarations to disambiguate grammars L7AMBAST E2 E4 E4 if Associativity Declarations No general techniques for dealing with ambiguity CS780 Prasad E3 E1 A valid parse tree for a UIF OTHER UIF if E then E if E then MIF else UIF L7AMBAST if E1 all then are matched some then are unmatched MIF if E then MIF else MIF CS780 Prasad if if E E E E int int E int int E E int Left associativity declaration left 7 CS780 Prasad L7AMBAST 8 2 Precedence Declarations Consider the grammar E E E E E int And the string int int int E E E E int E E E int int E int Error Handling E int Precedence declarations left left CS780 Prasad E int L7AMBAST 9 L7AMBAST 10 Syntax Error Handling Types of errors Lexical An error that produces a wrong token Error handler should E g misspelling of an identifier keyword or operator Report errors accurately and clearly Recover from an error quickly Not slow down compilation of valid code Syntactic A program that does not satisfy the CFG of the language E g expression with unbalanced parentheses missing semicolon Semantic An error that needs context sensitive information to identify E g Operator applied to an incompatible operand Accessing an undeclared variable E g Infinitely recursive call Accessing an array out of bounds Dereferencing a null pointer L7AMBAST Good error handling is not easy to achieve Approaches from simple to complex Panic mode most popular Error productions Automatic local or global correction Logical Errors in the execution model CS780 Prasad CS780 Prasad Not all are supported by all parser generators 11 CS780 Prasad L7AMBAST 12 3 Error Recovery Panic Mode Syntax Error Recovery Error Productions Idea When an error is detected Discard tokens until one with a clear role is found Then continue Such tokens are called synchronizing tokens Typically the statement or expression terminators Consider the erroneous expression 1 2 3 Panic mode recovery Skip ahead to next integer and then continue Bison uses the special terminal error to describe how much input to skip E int E E E error int error CS780 Prasad L7AMBAST Idea specify in the grammar known common mistakes Essentially promotes common errors to alternative syntax Example Write 5 x instead of 5 x Add the production E E E Disadvantage Complicates the grammar 13 CS780 Prasad L7AMBAST Error Recovery Local and Global Correction Syntax Error Recovery Past and Present Idea Find a correct nearby program Past 14 Slow recompilation cycle even once a day Find as many errors in one cycle as possible Try token insertions and deletions Exhaustive search Cf Use of FIRST and FOLLOW sets in recursive descent parsers Present Disadvantages Quick recompilation cycle Users tend to correct one error cycle Complex error recovery not needed Panic mode seems enough Hard to implement Slows down parsing of correct programs Nearby is not necessarily the intended program Not all tools support it CS780 Prasad L7AMBAST 15 CS780 Prasad L7AMBAST 16 4 ASTs So far a parser traces the derivation of a sequence of tokens But the rest of the compiler needs a structural representation of the program Abstract syntax tree AST is like parse tree but ignores some details Abstract Syntax Trees Consider the grammar E int E E E 5 2 3 and the string After lexical analysis we have a list of tokens int5 int2 int3 which during parsing is turned into a parse tree CS780 Prasad L7AMBAST 17 Example of Parse Tree CS780 Prasad int2 Traces the operation of the parser Does capture the nesting structure But too much info E E CS780 Prasad PLUS int5 18 Example of Abstract Syntax Tree E E L7AMBAST E E int3 L7AMBAST PLUS 5 2 3 Also captures the nesting structure But abstracts from the concrete syntax no redundant delimiters more compact and easier to use AST is an important data structure in a compiler Parentheses Single successor nodes 19 CS780 Prasad L7AMBAST 20 5 Semantic Attributes and Actions Semantic Actions An Example Will be used to construct ASTs Consider the grammar Each grammar symbol may have attributes For each symbol X define an attribute X val E int E E E For terminal symbols lexical tokens attributes can be calculated by the lexer Each production may have an action Written as X Y1 Yn action That can refer to or compute symbol attributes CS780 Prasad L7AMBAST 21 CS780 Prasad CS780 Prasad E val int val E val E1 val E2 val E val E1 val L7AMBAST Example 22 E3 val E4 val E5 val Must compute E4 val and E5 val before E3 val That is E3 val depends on E4 val and E5 val Equations The parser must find the order of evaluation for the attributes E val E1 val E2 val E1 val int5 val 5 E2 val E3 val E3 val E4 val E5 val E4 val int2 val 2 E5 val int3 val 3 L7AMBAST E int E1 E2 E1 Semantic actions specify a system of equations String 5 2 3 Tokens int5 int2 int3 Productions We annotate the grammar with actions Semantic Actions Notes cont d E E1 E2 E1 int5 E2 E3 E3 E4 E5 E4 int2 E5 int3 For terminals val is the associated numeric value For non terminals val is the expression s value which is computed from the values of the sub expressions An attribute must be computed after all its successors


View Full Document

Wright CS 780 - Ambiguity and Errors Syntax-Directed Translation

Loading Unlocking...
Login

Join to view Ambiguity and Errors Syntax-Directed Translation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Ambiguity and Errors Syntax-Directed Translation and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?