Lecture Notes 1 The Three Hour Tour 1The Three Hour Tour Through Automata Theory Read Supplementary Materials: The Three Hour Tour Through Automata Theory Read Supplementary Materials: Review of Mathematical Concepts Read K & S Chapter 1 Do Homework 1. Let's Look at Some Problems int alpha, beta; alpha = 3; beta = (2 + 5) / 10; (1) Lexical analysis: Scan the program and break it up into variable names, numbers, etc. (2) Parsing: Create a tree that corresponds to the sequence of operations that should be executed, e.g., / + 10 2 5 (3) Optimization: Realize that we can skip the first assignment since the value is never used and that we can precompute the arithmetic expression, since it contains only constants. (4) Termination: Decide whether the program is guaranteed to halt. (5) Interpretation: Figure out what (if anything) it does. A Framework for Analyzing Problems We need a single framework in which we can analyze a very diverse set of problems. The framework we will use is Language Recognition A language is a (possibly infinite) set of finite length strings over a finite alphabet. Languages (1) Σ = {0,1,2,3,4,5,6,7,8,9} L = {w ∈ Σ*: w represents an odd integer} = {w ∈ Σ*: the last character of w is 1,3,5,7, or 9} = (0∪1∪2∪3∪4∪5∪6∪7∪8∪9)* (1∪3∪5∪7∪9) (2) Σ = {(,)} L = {w ∈ Σ*: w has matched parentheses} = the set of strings accepted by the grammar: S → ( S ) S → SS S → ε (3) L = {w: w is a sentence in English} Examples: Mary hit the ball. Colorless green ideas sleep furiously. The window needs fixed. (4) L = {w: w is a C program that halts on all inputs}Lecture Notes 1 The Three Hour Tour 2Encoding Output in the Input String (5) Encoding multiplication as a single input string L = {w of the form: <integer>x<integer>=<integer>, where <integer> is any well formed integer, and the third integer is the product of the first two} 12x9=108 12=12 12x8=108 (6) Encoding prime decomposition L = {w of the form: <integer1>/<integer2>,<integer3> …, where integers 2 - n represent the prime decomposition of integer 1. 15/3,5 2/2 More Languages (7) Sorting as a language recognition task: L = {w1 # w2: ∃n ≥1, w1 is of the form int1, int2, … intn, w2 is of the form int1, int2, … intn, and w2 contains the same objects as w1 and w2 is sorted} Examples: 1,5,3,9,6#1,3,5,6,9 ∈ L 1,5,3,9,6#1,2,3,4,5,6,7 ∉ L (8) Database querying as a language recognition task: L = {d # q # a: d is an encoding of a database, q is a string representing a query, and a is the correct result of applying q to d} Example: (name, age, phone), (John, 23, 567-1234) (Mary, 24, 234-9876 )# (select name age=23) # (John) ∈ L The Traditional Problems and their Language Formulations are Equivalent By equivalent we mean: If we have a machine to solve one, we can use it to build a machine to do the other using just the starting machine and other functions that can be built using a machine of equal or lesser power. Consider the multiplication example: L = {w of the form: <integer>x<integer>=<integer>, where <integer> is any well formed integer, and the third integer is the product of the first two} Given a multiplication machine, we can build the language recognition machine: Given the language recognition machine, we can build a multiplication machine:Lecture Notes 1 The Three Hour Tour 3A Framework for Describing Languages Clearly, if we are going to work with languages, each one must have a finite description. Finite Languages: Easy. Just list the elements of the language. L = {June, July, August} Infinite Languages: Need a finite description. Grammars let us use recursion to do this. Grammars 1 (1) The Language of Matched Parentheses S → ( S ) S → SS S → ε (2) The Language of Odd Integers S → 1 S → 3 S → 5 S → 7 S → 9 S → 0 S S → 1 S S → 2 S S → 3 S S → 4 S S → 5 S S → 6 S S → 7 S S → 8 S S → 9 S Grammars 2 S → O S → A O A →A D A → D D → O D → E O → 1 O → 3 O → 5 O → 7 O → 9 E→ 0 E→ 2 E→ 4 E→ 6 E→ 8 Grammars 3 (3) The Language of Simple Arithmetic Expressions S → <exp> <exp> → <number> <exp> → (<exp>) <exp> → - <exp> <exp> → <exp> <op> <exp> <op> → + | - | * | / <number> → <digit> <number> → <digit> <number> <digit > → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9Lecture Notes 1 The Three Hour Tour 4Grammars as Generators and Acceptors Top Down Parsing 4 + 3 Bottom Up Parsing 4 + 3 The Language Hierarchy Recursively Enumerable Languages Recursive Languages Context-Free Languages Regular LanguagesLecture Notes 1 The Three Hour Tour 5Regular Grammars In a regular grammar, all rules must be of the form: <one nonterminal> → <one terminal> or ε or <one nonterminal> → <one terminal><one nonterminal> So, the following rules are okay: S → ε S → a S → aS But these are not: S → ab S → SS aS → b Regular Expressions and Languages Regular expressions are formed from ∅ and the characters in the target alphabet, plus the operations of: • Concatenation: αβ means α followed by β • Or (Set Union): α∪β means α Or (Union) β • Kleene *: α* means 0 or more occurrences of α concatenated together. • At Least 1: α+ means 1 or more occurrences of α concatenated together. • (): used to group the other operators Examples: (1) Odd integers: (0∪1∪2∪3∪4∪5∪6∪7∪8∪9)*(1∪3∪5∪7∪9) (2) Identifiers: (A-Z)+((A-Z) ∪(0-9))* (3) Matched Parentheses Context Free Grammars (1) The Language of Matched Parentheses S → ( S ) S → SS S → ε (2) The Language of Simple Arithmetic Expressions S → <exp> <exp> → <number> <exp> → (<exp>) <exp> → - <exp> <exp> → <exp> <op> <exp> <op> → + | - | * | / <number> → <digit> <number> → <digit> <number> <digit > → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9Lecture Notes 1 The Three Hour Tour 6Not All Languages are Context-Free English: S → NP VP NP → the NP1 | NP1
View Full Document