DOC PREVIEW
UD CISC 672 - Constructing a Scanner from Regular Expressions

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Lexical Analysis — Part II:Constructing a Scanner from Regular ExpressionsQuick ReviewPrevious class:The scanner is the first stage in the front endSpecifications can be expressed using regular expressionsBuild tables and code from a DFAScannerScannerGeneratorspecificationssource code parts of speech & wordscode and tablesGoal• We will show how to construct a finite state automaton to recognize any RE•This LectureConvert RE to an nondeterministic finite automaton (NFA) Requires -transitions to combine regular subexpressionsConvert an NFA to a deterministic finite automaton (DFA)Use Subset constructionNext LectureMinimize the number of statesHopcroft state minimization algorithmGenerate the scanner codeAdditional code can be insertedMore Regular Expressions• All strings of 1s and 0s ending in a 1( 0 | 1 )* 1•All strings over lowercase letters where the vowels (a,e,i,o, & u) occur exactly once, in ascending orderCons  (b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|y|z)Cons* a Cons* e Cons* i Cons* o Cons* u Cons* •All strings of 1s and 0s that do not contain three 0s in a row:More Regular Expressions• All strings of 1s and 0s ending in a 1( 0 | 1 )* 1•All strings over lowercase letters where the vowels (a,e,i,o, & u) occur exactly once, in ascending orderCons  (b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|y|z)Cons* a Cons* e Cons* i Cons* o Cons* u Cons* •All strings of 1s and 0s that do not contain three 0s in a row:( 1* (  |01 | 001 ) 1* )* (  | 0 | 00 )Non-deterministic Finite AutomataEach RE corresponds to a deterministic finite automaton (DFA)• May be hard to directly construct the right DFAWhat about an RE such as ( a | b )* abb?This is a little different•S1 has two transitions on a This is a non-deterministic finite automaton (NFA)a , bS0 S3S1 S2 a bbNon-deterministic Finite AutomataEach RE corresponds to a deterministic finite automaton (DFA)• May be hard to directly construct the right DFAWhat about an RE such as ( a | b )* abb?This is a little different•S1 has two transitions on a •S0 has a transition on  This is a non-deterministic finite automaton (NFA)a , bS0 S1 S4 S2 S3 a bbNondeterministic Finite Automata• An NFA accepts a string x iff  a path though the transition graph from s0 to a final state such that the edge labels spell x•Transitions on  consume no input•To “run” the NFA, start in s0 and guess the right transition at each stepAlways guess correctlyIf some sequence of correct guesses accepts x then acceptWhy study NFAs?•They are the key to automating the REDFA construction•We can paste together NFAs with -transitionsNFA NFA becomes an NFARelationship between NFAs and DFAsDFA is a special case of an NFA•DFA has no  transitions•DFA’s transition function is single-valued• Same rules will workDFA can be simulated with an NFAObviouslyNFA can be simulated with a DFA (less obvious)• Simulate sets of possible states• Possible exponential blowup in the state space• Still, one state per character in the input streamAutomating Scanner ConstructionTo convert a specification into code:1Write down the RE for the input language2Build a big NFA3Build the DFA that simulates the NFA4Systematically shrink the DFA5Turn it into codeScanner generators• Lex and Flex work along these lines• Algorithms are well-known and well-understood• Key issue is interface to parser (define all parts of speech)• You could build one in a weekend!Automating Scanner ConstructionRE NFA (Thompson’s construction)• Build an NFA for each term•Combine them with -transitionsNFA  DFA (subset construction)• Build the simulationDFA  Minimal DFA• Hopcroft’s algorithm DFA RE (Not part of the scanner construction) • All pairs, all paths problem•Take the union of all paths from s0 to an accepting stateminimal DFARE NFA DFAThe Cycle of ConstructionsRE NFA using Thompson’s ConstructionKey idea•NFA pattern for each symbol & each operator•Join them with  transitions in precedence orderS0 S1 aNFA for aS0 S1 aS3 S4 bNFA for abNFA for a | bS0 S1 S2 aS3 S4 bS5  S0 S1 S3 S4 NFA for a*aKen Thompson, CACM, 1968Example of Thompson’s ConstructionLet’s try a ( b | c )* 1. a, b, & c2. b | c3. ( b | c )* S0 S1 aS0 S1 bS0 S1 cS2 S3 bS4 S5 cS1S6 S0 S7    S1 S2 bS3 S4 cS0 S5 Example of Thompson’s Construction (cont'd)4. a ( b | c )* Of course, a human would design something simpler ...S0 S1 ab | cBut, we can automate production of the more complex one ...S0 S1 aS4 S5 bS6 S7 cS3S8 S2 S9    NFA DFA with Subset ConstructionNeed to build a simulation of the NFATwo key functions•Delta(qi , a) is set of states reachable from each state in qi by aReturns a set of states, for each nqi of δi (n, •-closure(si) is set of states reachable from si by transitionsThe algorithm:•Start state derived from n0 of the NFA•Take its -closure q0 = -closure(n0) •Compute Delta(q, ) for each   , and take its -closure•Iterate until no more states are addedSounds more complex than it is…NFA DFA with Subset ConstructionThe algorithm:q0 -closure(n0 )Q q0}WorkList q0while ( WorkList ≠ ф ) remove q from WorkList for each    t -closure(Delta(q,)) T[q,]  t if ( t  Q ) then add t to Q and WorkList Let’s think about why this worksThe algorithm halts:1. Q contains no duplicates (test before adding)2. 2Q is finite3. while loop adds to Q, but does not remove from Q (monotone) the loop haltsQ contains all the reachable NFA statesIt tries each character in each q.It builds every possible NFA configuration. Q and T form the DFANFA DFA with Subset ConstructionExample of a fixed-point computation• Monotone construction of some finite set•Halts when it stops adding to the set• Proofs of halting & correctness are similar• These computations arise in many contexts Other fixed-point computations• Canonical construction of sets of LR(1) itemsQuite similar to the subset construction •Classic data-flow analysis (& Gaussian Elimination)Solving sets of simultaneous set equationsWe


View Full Document

UD CISC 672 - Constructing a Scanner from Regular Expressions

Documents in this Course
Syllabus

Syllabus

18 pages

Load more
Download Constructing a Scanner from Regular Expressions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Constructing a Scanner from Regular Expressions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Constructing a Scanner from Regular Expressions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?