Wright CS 780 - Ambiguity and Errors Syntax-Directed Translation

Unformatted text preview:

1CS780(Prasad) L7AMBAST 1Ambiguity and ErrorsSyntax-Directed TranslationCS780(Prasad) L7AMBAST 2Outline• Ambiguity (revisited)• Extensions of CFG for parsingPrecedence declarationsError handlingSemantic actions• Constructing a parse treeCS780(Prasad) L7AMBAST 3Ambiguity Revisited• Ambiguity common in programming languages.Arithmetic expressions IF-THEN-ELSE (The Dangling Else Problem)• Consider the grammarE  if E then E| if E then E else E| OTHERThis grammar is ambiguous.CS780(Prasad) L7AMBAST 4The Dangling Else Example• The expressionif E1then if E2then E3else E4has two parse trees.ifE1ifE2E3E4ifE1ifE2E3E4• Typically we want the second form.2CS780(Prasad) L7AMBAST 5The Dangling Else: A Fix•elsematches the closest unmatched then.• We can describe this in the grammar for the same language as follows:E  MIF /* all then are matched */ | UIF /* some then are unmatched */MIF  if E then MIF else MIF | OTHERUIF  if E then E| if E then MIF else UIF CS780(Prasad) L7AMBAST 6The Dangling Else: Revisited• The expression if E1then if E2then E3else E4ifE1ifE2E3E4ifE1ifE2E3E4• Not valid because the then expression is not a MIF• A valid parse tree (for a UIF)CS780(Prasad) L7AMBAST 7Handling Ambiguity• No general techniques for dealing with ambiguity. Impossible to convert automatically an ambiguous grammar to an unambiguous one.Ambiguity checking is undecidable.Inherently ambiguous context-free languages exist.• Instead of rewriting the grammar, use the more natural (ambiguous) grammar along with disambiguating declarations. Most parser generator tools allow precedence and associativity declarations to disambiguate grammars.CS780(Prasad) L7AMBAST 8Associativity Declarations• Consider the grammar E  E + E | int • Ambiguous: two parse trees of int + int + intEEEEE+int +intintEEEEE+int+intint• Left associativity declaration: %left +3CS780(Prasad) L7AMBAST 9Precedence Declarations• Consider the grammar E  E + E | E * E | int  And the string int + int * intEEEEE+int *intintEEEEE*int+intint• Precedence declarations: %left +%left *CS780(Prasad) L7AMBAST 10Error HandlingCS780(Prasad) L7AMBAST 11Types of errors• Lexical: An error that produces a wrong token E.g., misspelling of an identifier, keyword or operator• Syntactic: A program that does not satisfy the CFG of the language E.g., expression with unbalanced parentheses, missing semicolon• Semantic: An error that needs context sensitive information to identify E.g., Operator applied to an incompatible operand,Accessing an undeclared variable• Logical: Errors in the execution model E.g., Infinitely recursive call, Accessing an array out of bounds, Dereferencing a null pointerCS780(Prasad) L7AMBAST 12Syntax Error Handling• Error handler should Report errors accurately and clearly. Recover from an error quickly. Not slow down compilation of valid code.Good error handling is not easy to achieve.• Approaches (from simple to complex) Panic mode (most popular!) Error productions Automatic local or global correctionNot all are supported by all parser generators.4CS780(Prasad) L7AMBAST 13Error Recovery: Panic Mode• Idea: When an error is detected: Discard tokens until one with a clear role is found. Then continue. Such tokens are called synchronizing tokens.Typically the statement or expression terminators.• Consider the erroneous expression(1 + + 2) + 3• Panic-mode recovery: Skip ahead to next integer and then continue.• Bison: uses the special terminal error to describe how much input to skip.E  int | E + E | ( E ) | error int | ( error )CS780(Prasad) L7AMBAST 14Syntax Error Recovery: Error Productions• Idea: specify, in the grammar, known common mistakes. Essentially promotes common errors to alternative syntax.• Example: Write 5 x instead of 5 * xAdd the production E  … | E E• DisadvantageComplicates the grammarCS780(Prasad) L7AMBAST 15Error Recovery: Local and Global Correction• Idea: Find a correct “nearby” program  Try token insertions and deletions. Exhaustive search.Cf. Use of FIRST and FOLLOW sets in recursive descent parsers.• Disadvantages:  Hard to implement. Slows down parsing of correct programs. “Nearby” is not necessarily “the intended” program. Not all tools support it.CS780(Prasad) L7AMBAST 16Syntax Error Recovery: Past and Present•PastSlow recompilation cycle (even once a day).Find as many errors in one cycle as possible.•PresentQuick recompilation cycle.Users tend to correct one error/cycle.Complex error recovery not needed.Panic-mode seems enough.5CS780(Prasad) L7AMBAST 17Abstract Syntax TreesCS780(Prasad) L7AMBAST 18ASTs• So far a parser traces the derivation of a sequence of tokens. But the rest of the compiler needs a structural representation of the program.• Abstract syntax tree (AST) is like parse tree but ignores some details.• Consider the grammar:E  int | ( E ) | E + E and the string: 5 + (2 + 3)• After lexical analysis, we have a list of tokensint5‘+’ ‘(’ int2‘+’ int3‘)’which, during parsing is turned into a parse tree …CS780(Prasad) L7AMBAST 19Example of Parse TreeEEE(E)+E+int5int2Eint3• Traces the operation of the parser• Does capture the nesting structure• But too much info Parentheses Single-successor nodesCS780(Prasad) L7AMBAST 20Example of Abstract Syntax Tree• Also captures the nesting structure.•But abstracts from the concrete syntax (no redundant delimiters  more compact and easier to use).• AST is an important data structure in a compiler.PLUSPLUS25 36CS780(Prasad) L7AMBAST 21Semantic Attributes and Actions• Will be used to construct ASTs.• Each grammar symbol may have attributes.For terminal symbols (lexical tokens) attributes can be calculated by the lexer.• Each production may have an action.Written as: X  Y1… Yn{ action }That can refer to or compute symbol attributes.CS780(Prasad) L7AMBAST 22Semantic Actions: An Example• Consider the grammarE  int | E + E | ( E )• For each symbol X, define an attribute X.val For terminals, val is the associated numeric value. For non-terminals, val is the expression’s value (which is computed from the values of the sub-expressions).• We annotate the grammar with actions:E  int


View Full Document

Wright CS 780 - Ambiguity and Errors Syntax-Directed Translation

Download Ambiguity and Errors Syntax-Directed Translation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Ambiguity and Errors Syntax-Directed Translation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Ambiguity and Errors Syntax-Directed Translation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?