Unformatted text preview:

1Lexical AnalysisRegularExpressionsNondeterministicFinite Automata(NFA)DeterministicFinite Automata(DFA)ImplementationOf DFARegular Expressions (REs) Compact mechanism for defining a language Generally easier to understand than FSMs Example: identifier – letter followed by zero or more letters or digitsletter (letter|digit)* Used as input to scanner generator Define each token, also Define white-space, comments, etc. Things that do not correspond to tokens but must also be recognized and ignored2Regular Expression OperatorsUsed for grouping (as in programming languages)( X ) grouping One or more occurrence of XX +Zero or more occurrences of XX * Kleene closureX or Y (alternatives)X | Y alternationX followed by YX Y concatentationOperands of RE Operators The empty string ε Single characters of the underlying alphabet Shorthands for groups of characters (letter for A-Z or a-z, digit for 0-9, etc.) Legal regular expressions (an operator may be applied to the result of an operator)3Precedence for RE Operators For example:letter letter | digit *letter ( letter | digit ) *highestX ^ YX *, X +middleX * YX YlowestX + YX | YPrecedenceAnalagous ArithmeticOperatorRegular ExpressionOperatorLanguage Defined By a RE Recall, for an automaton the language is the set of strings accepted by the automaton For a RE, the language is the set of strings matchedby the RE{ “”, “a”, “b”, “c”, “aa”, “ab”,“ac”, “ba”, “bb”, “bc”, … }( a | b | c ) *{ “a”, “b”, “c” }a | b | c{ “ab” }ab{ “” }εSet of StringsRegular Expression4Understanding REs Describe these languages:0 ( 0 | 1 )* 0(0 | 1 ) * 0 ( 0 | 1 ) ( 0 | 1 )1* ( 0 (1 *) 0 (1 *) ) *Writing Regular Expressions Translate these into regular expressions Words ending in “ing” (a word consists of lower or upper-case letters) Binary strings with an odd number of 1s5Writing Regular Expressions Floating point numbers with an optional leading sign (+ or -) consisting of at least one digit and an optional decimal point (if there is a decimal point, there must be at least one digit before and one after the decimal point)From RE to a Scanner Theorem: for every regular expression, there is a deterministic finite-state machine that defines the same language (and vice versa) Q: How do we create this machine (automatically)? Idea: start by translating a RE to an NFA6Lexical AnalysisRegularExpressionsNondeterministicFinite Automata(NFA)DeterministicFinite Automata(DFA)ImplementationOf DFARE to NFA(1) For each kind of RE, define an NFA Notation: NFA for RE M For ε For input aMεa7RE to NFA (2) For A B For A | BABεABεεεεRE to NFA (3) For A* A+ ?Aεεεε8Example: RE to NFA Consider the regular expression(1|0)*1 The NFA is101εεεεεεεεεABCDEFGHIJAnother Example(ε | + | - ) digit+ (ε | . digit+ )9Lexical AnalysisRegularExpressionsNondeterministicFinite Automata(NFA)DeterministicFinite Automata(DFA)ImplementationOf DFANFA to DFA: The Trick Simulate the NFA Each state of the DFA = a non-empty subset of states of the NFA Start state = the set of NFA states reachable through ε-moves from NFA start state Add a transition S →a S’ to DFA iff S’ is the set of NFA states reachable from any state in S after seeing the input a, considering ε-moves as well10NFA to DFA: Remark An NFA may be in many states at any time How many different states ? If there are N states, the NFA must be in some subset of those N states How many subsets are there? 2N- 1 = finitely manyNFA -> DFA Example101εεεεεεεεεABCDEFGHIJABCDHIFGHIABCDEJGHIABCD01010111NFA to DFA: the practice NFA to DFA conversion is at the heart of tools such as flex But, DFAs can be huge In practice, flex-like tools trade off speed for space in the choice of NFA and DFA


View Full Document

U of M CS 5641 - Lexical Analysis

Download Lexical Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lexical Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lexical Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?