Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Lexical Analysis — Part II:Constructing a Scanner from Regular ExpressionsQuick ReviewPrevious class:The scanner is the first stage in the front endSpecifications can be expressed using regular expressionsBuild tables and code from a DFAScannerScannerGeneratorspecificationssource code parts of speech & wordscode and tablesGoal• We will show how to construct a finite state automaton to recognize any RE•This LectureConvert RE to an nondeterministic finite automaton (NFA) Requires -transitions to combine regular subexpressionsConvert an NFA to a deterministic finite automaton (DFA)Use Subset constructionNext LectureMinimize the number of statesHopcroft state minimization algorithmGenerate the scanner codeAdditional code can be insertedMore Regular Expressions• All strings of 1s and 0s ending in a 1( 0 | 1 )* 1•All strings over lowercase letters where the vowels (a,e,i,o, & u) occur exactly once, in ascending orderCons (b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|y|z)Cons* a Cons* e Cons* i Cons* o Cons* u Cons* •All strings of 1s and 0s that do not contain three 0s in a row:More Regular Expressions• All strings of 1s and 0s ending in a 1( 0 | 1 )* 1•All strings over lowercase letters where the vowels (a,e,i,o, & u) occur exactly once, in ascending orderCons (b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|y|z)Cons* a Cons* e Cons* i Cons* o Cons* u Cons* •All strings of 1s and 0s that do not contain three 0s in a row:( 1* ( |01 | 001 ) 1* )* ( | 0 | 00 )Non-deterministic Finite AutomataEach RE corresponds to a deterministic finite automaton (DFA)• May be hard to directly construct the right DFAWhat about an RE such as ( a | b )* abb?This is a little different•S1 has two transitions on a This is a non-deterministic finite automaton (NFA)a , bS0 S3S1 S2 a bbNon-deterministic Finite AutomataEach RE corresponds to a deterministic finite automaton (DFA)• May be hard to directly construct the right DFAWhat about an RE such as ( a | b )* abb?This is a little different•S1 has two transitions on a •S0 has a transition on This is a non-deterministic finite automaton (NFA)a , bS0 S1 S4 S2 S3 a bbNondeterministic Finite Automata• An NFA accepts a string x iff a path though the transition graph from s0 to a final state such that the edge labels spell x•Transitions on consume no input•To “run” the NFA, start in s0 and guess the right transition at each stepAlways guess correctlyIf some sequence of correct guesses accepts x then acceptWhy study NFAs?•They are the key to automating the REDFA construction•We can paste together NFAs with -transitionsNFA NFA becomes an NFARelationship between NFAs and DFAsDFA is a special case of an NFA•DFA has no transitions•DFA’s transition function is single-valued• Same rules will workDFA can be simulated with an NFAObviouslyNFA can be simulated with a DFA (less obvious)• Simulate sets of possible states• Possible exponential blowup in the state space• Still, one state per character in the input streamAutomating Scanner ConstructionTo convert a specification into code:1Write down the RE for the input language2Build a big NFA3Build the DFA that simulates the NFA4Systematically shrink the DFA5Turn it into codeScanner generators• Lex and Flex work along these lines• Algorithms are well-known and well-understood• Key issue is interface to parser (define all parts of speech)• You could build one in a weekend!Automating Scanner ConstructionRE NFA (Thompson’s construction)• Build an NFA for each term•Combine them with -transitionsNFA DFA (subset construction)• Build the simulationDFA Minimal DFA• Hopcroft’s algorithm DFA RE (Not part of the scanner construction) • All pairs, all paths problem•Take the union of all paths from s0 to an accepting stateminimal DFARE NFA DFAThe Cycle of ConstructionsRE NFA using Thompson’s ConstructionKey idea•NFA pattern for each symbol & each operator•Join them with transitions in precedence orderS0 S1 aNFA for aS0 S1 aS3 S4 bNFA for abNFA for a | bS0 S1 S2 aS3 S4 bS5 S0 S1 S3 S4 NFA for a*aKen Thompson, CACM, 1968Example of Thompson’s ConstructionLet’s try a ( b | c )* 1. a, b, & c2. b | c3. ( b | c )* S0 S1 aS0 S1 bS0 S1 cS2 S3 bS4 S5 cS1S6 S0 S7 S1 S2 bS3 S4 cS0 S5 Example of Thompson’s Construction (cont'd)4. a ( b | c )* Of course, a human would design something simpler ...S0 S1 ab | cBut, we can automate production of the more complex one ...S0 S1 aS4 S5 bS6 S7 cS3S8 S2 S9 NFA DFA with Subset ConstructionNeed to build a simulation of the NFATwo key functions•Delta(qi , a) is set of states reachable from each state in qi by aReturns a set of states, for each nqi of δi (n, •-closure(si) is set of states reachable from si by transitionsThe algorithm:•Start state derived from n0 of the NFA•Take its -closure q0 = -closure(n0) •Compute Delta(q, ) for each , and take its -closure•Iterate until no more states are addedSounds more complex than it is…NFA DFA with Subset ConstructionThe algorithm:q0 -closure(n0 )Q q0}WorkList q0while ( WorkList ≠ ф ) remove q from WorkList for each t -closure(Delta(q,)) T[q,] t if ( t Q ) then add t to Q and WorkList Let’s think about why this worksThe algorithm halts:1. Q contains no duplicates (test before adding)2. 2Q is finite3. while loop adds to Q, but does not remove from Q (monotone) the loop haltsQ contains all the reachable NFA statesIt tries each character in each q.It builds every possible NFA configuration. Q and T form the DFANFA DFA with Subset ConstructionExample of a fixed-point computation• Monotone construction of some finite set•Halts when it stops adding to the set• Proofs of halting & correctness are similar• These computations arise in many contexts Other fixed-point computations• Canonical construction of sets of LR(1) itemsQuite similar to the subset construction •Classic data-flow analysis (& Gaussian Elimination)Solving sets of simultaneous set equationsWe
View Full Document