Unformatted text preview:

Lexical Analysis Finite Automata (Part 2 of 2)Slide 2Slide 3SummarySlide 5Languages and AutomataLimitations of Regular LanguagesThe Functionality of the ParserExampleComparison with Lexical AnalysisThe Role of the ParserProgramming Language StructureNotation for Programming LanguagesObservationContext Free GrammarsExamples of CFGsThe Language of a CFGKey IdeaThe Language of a CFG (Cont.)Slide 20Slide 21Examples:Arithmetic ExampleCool ExampleCool Example (Cont.)NotesMore NotesDerivations and Parse TreesDerivation ExampleDerivation Example (Cont.)Derivation in Detail (1)Derivation in Detail (2)Derivation in Detail (3)Derivation in Detail (4)Derivation in Detail (5)Derivation in Detail (6)Notes on DerivationsQ: Radio (117 / 842) Q: Games (548 / 842) Q: Music (158 / 842) Q: Games (582 / 842) Left-most and Right-most DerivationsRight-most Derivation in Detail (1)Right-most Derivation in Detail (2)Right-most Derivation in Detail (3)Right-most Derivation in Detail (4)Right-most Derivation in Detail (5)Right-most Derivation in Detail (6)Slide 49Summary of DerivationsReviewAmbiguityAmbiguity. ExampleSlide 54Ambiguity (Cont.)Dealing with AmbiguitySlide 57Ambiguity: The Dangling ElseThe Dangling Else: ExampleThe Dangling Else: A FixThe Dangling Else: Example RevisitedSlide 62Precedence and Associativity DeclarationsAssociativity DeclarationsSlide 65Slide 66#1Intro To ParsingIntro To ParsingStep By StepStep By Step#2Self-Test Answers•Are practical parsers and scanners based on deterministic or non-deterministic automata?– Deterministic–Half credit: “they are equivalent” w/o “convert to DFA”•How can regular expressions be used to specify nested constructs?–They cannot. (Scott Book 2.1.2, second sentence)–Nested parentheses (n )n are the canonical example.• How is a two-dimensional transition table used in table-driven scanning? –new_state = table[old_state][new_character]#3Administrivia•Number of students submitting PA-X to a PA-Y area: 5 / 39 = 13%– This is a 4000-level class.–(We could turn off submissions after the due date: this yields a zero-tolerance late policy.) –(We could turn off submissions before the due date: this means students cannot work ahead.) •PA1 average: 43.8 / 50 = 88%– Excellent work, class!–Real time grade-checking demo.#4Cunning Plan•Formal Languages–Regular Languages Revisited•Parser Overview•Context-Free Grammars (CFGs)• Derivations•Ambiguity#5One-Slide Summary•A parser takes a sequence of tokens as input. If the input is valid, it produces a parse tree (or derivation).•Context-free grammars are a notation for specifying formal languages. They contain terminals, non-terminals and productions (aka rewrite rules).#6Languages and Automata•Formal languages are very important in CS– Especially in programming languages•Regular languages– The weakest formal languages widely used– Many applications• We will also study context-free languages–A “stronger” type of formal language#7Limitations of Regular Languages• Intuition: A finite automaton that runs long enough must repeat states– Pigeonhole Principle: imagine 20 states and 300 input characters•A finite automaton can’t remember how often it has visited a particular state– Only enough to store in which state it is in– Cannot count, except up to a finite limit•Language of balanced parentheses is not regular: { (i )i | i ¸ 0} –http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages#8The Functionality of the Parser•Input: sequence of tokens from lexer– e.g., the .cl-lex files you make in PA2•Output: parse tree of the program– Also called an abstract syntax tree•Output: error if the input is not valid– e.g., “parse error on line 3”#9Example•Cool program textif x = y then 1 else 2 fi•Parser input (tokens)IF ID = ID THEN INT ELSE INT FI• Parser output (tree)IF-THEN-ELSE=ID IDINTINT#11The Role of the Parser•Not all sequences of tokens are programs– then x * / + 3 while x ; y z then •The parser must distinguish between valid and invalid sequences of tokens•We need– A language or formalism to describe valid sequences of tokens– A method (an algorithm) for distinguishing valid from invalid sequences of tokens#12Programming Language Structure•Programming languages have recursive structure• Consider the language of arithmetic expressions with integers, +, *, and ( )• An expression is either:– an integer– an expression followed by “+” followed by expression– an expression followed by “*” followed by expression–a ‘(‘ followed by an expression followed by ‘)’• int , int + int , ( int + int) * int are expressions#13Notation for Programming Languages•An alternative notation: E ! int E ! E + E E ! E * E E ! ( E )•We can view these rules as rewrite rules– We start with E and replace occurrences of E with some right-hand side•E ! E * E ! ( E ) * E ! ( E + E ) * E ! ... ! (int + int) * int#14Observation•All arithmetic expressions can be obtained by a sequence of replacements•Any sequence of replacements forms a valid arithmetic expression•This means that we cannot obtain( int ) ) by any sequence of replacements. Why? • This notation is a context-free grammar#15Context Free Grammars•A context-free grammar consists of–A set of non-terminals N• Written in uppercase in these notes–A set of terminals T•Lowercase or punctuation in these notes– A start symbol S (a non-terminal)– A set of productions (rewrite rules)•Assuming E 2 N E !  , or E ! Y1 Y2 ... Yn where Yi µ N [ T#16Examples of CFGsSimple arithmetic expressions: E ! int E ! E + E E ! E * E E ! ( E )– One non-terminal: E– Several terminals: int + * ( )•Called terminals because they are never replaced–By convention the non-terminal for the first production is the start symbol#17The Language of a CFGRead productions as replacement rules: X ! Y1 ... YnMeans X can be replaced by Y1 ... YnX ! Means X can be erased or eaten(replaced with empty string)#18Key IdeaTo construct a valid sequence of terminals:●Begin with a string consisting of the start symbol “S”●Replace any non-terminal X in the string by a right-hand side of some production X ! Y1 … Yn ●Continue replacing until there are only terminals in the string#19The Language of a CFG: !More formally, write X1 … Xi-1


View Full Document

UVA CS 4610 - Intro To Parsing

Download Intro To Parsing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Intro To Parsing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Intro To Parsing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?