1Lexical AnalysisRegularExpressionsNondeterministicFinite Automata(NFA)DeterministicFinite Automata(DFA)ImplementationOf DFARegular Expressions (REs) Compact mechanism for defining a language Generally easier to understand than FSMs Example: identifier – letter followed by zero or more letters or digitsletter (letter|digit)* Used as input to scanner generator Define each token, also Define white-space, comments, etc. Things that do not correspond to tokens but must also be recognized and ignored2Regular Expression OperatorsUsed for grouping (as in programming languages)( X ) grouping One or more occurrence of XX +Zero or more occurrences of XX * Kleene closureX or Y (alternatives)X | Y alternationX followed by YX Y concatentationOperands of RE Operators The empty string ε Single characters of the underlying alphabet Shorthands for groups of characters (letter for A-Z or a-z, digit for 0-9, etc.) Legal regular expressions (an operator may be applied to the result of an operator)3Precedence for RE Operators For example:letter letter | digit *letter ( letter | digit ) *highestX ^ YX *, X +middleX * YX YlowestX + YX | YPrecedenceAnalagous ArithmeticOperatorRegular ExpressionOperatorLanguage Defined By a RE Recall, for an automaton the language is the set of strings accepted by the automaton For a RE, the language is the set of strings matchedby the RE{ “”, “a”, “b”, “c”, “aa”, “ab”,“ac”, “ba”, “bb”, “bc”, … }( a | b | c ) *{ “a”, “b”, “c” }a | b | c{ “ab” }ab{ “” }εSet of StringsRegular Expression4Understanding REs Describe these languages:0 ( 0 | 1 )* 0(0 | 1 ) * 0 ( 0 | 1 ) ( 0 | 1 )1* ( 0 (1 *) 0 (1 *) ) *Writing Regular Expressions Translate these into regular expressions Words ending in “ing” (a word consists of lower or upper-case letters) Binary strings with an odd number of 1s5Writing Regular Expressions Floating point numbers with an optional leading sign (+ or -) consisting of at least one digit and an optional decimal point (if there is a decimal point, there must be at least one digit before and one after the decimal point)From RE to a Scanner Theorem: for every regular expression, there is a deterministic finite-state machine that defines the same language (and vice versa) Q: How do we create this machine (automatically)? Idea: start by translating a RE to an NFA6Lexical AnalysisRegularExpressionsNondeterministicFinite Automata(NFA)DeterministicFinite Automata(DFA)ImplementationOf DFARE to NFA(1) For each kind of RE, define an NFA Notation: NFA for RE M For ε For input aMεa7RE to NFA (2) For A B For A | BABεABεεεεRE to NFA (3) For A* A+ ?Aεεεε8Example: RE to NFA Consider the regular expression(1|0)*1 The NFA is101εεεεεεεεεABCDEFGHIJAnother Example(ε | + | - ) digit+ (ε | . digit+ )9Lexical AnalysisRegularExpressionsNondeterministicFinite Automata(NFA)DeterministicFinite Automata(DFA)ImplementationOf DFANFA to DFA: The Trick Simulate the NFA Each state of the DFA = a non-empty subset of states of the NFA Start state = the set of NFA states reachable through ε-moves from NFA start state Add a transition S →a S’ to DFA iff S’ is the set of NFA states reachable from any state in S after seeing the input a, considering ε-moves as well10NFA to DFA: Remark An NFA may be in many states at any time How many different states ? If there are N states, the NFA must be in some subset of those N states How many subsets are there? 2N- 1 = finitely manyNFA -> DFA Example101εεεεεεεεεABCDEFGHIJABCDHIFGHIABCDEJGHIABCD01010111NFA to DFA: the practice NFA to DFA conversion is at the heart of tools such as flex But, DFAs can be huge In practice, flex-like tools trade off speed for space in the choice of NFA and DFA
View Full Document