UD CISC 672 - Constructing a Scanner from Regular Expressions - D2149923

Home> Schools> University of Delaware> Computer/Information Sciences (CISC) (CISC) > CISC 672> Constructing a Scanner from Regular Expressions

DOC PREVIEW

UD CISC 672 - Constructing a Scanner from Regular Expressions

School name University of Delaware

Course Cisc 672- Compiler Construction

Pages 24

This preview shows page 1-2-23-24 out of 24 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Lexical Analysis — Part II: Constructing a Scanner from Regular ExpressionsQuick Review Previous class: → The scanner is the first stage in the front end → Specifications can be expressed using regular expressions → Build tables and code from a DFA Scanner Scanner Generator specifications source code parts of speech & words code and tablesGoal • We will show how to construct a finite state automaton to recognize any RE • This Lecture → Convert RE to an nondeterministic finite automaton (NFA)  Requires ε-transitions to combine regular subexpressions → Convert an NFA to a deterministic finite automaton (DFA)  Use Subset construction Next Lecture → Minimize the number of states  Hopcroft state minimization algorithm → Generate the scanner code  Additional code can be insertedMore Regular Expressions • All strings of 1s and 0s ending in a 1 ( 0 | 1 )* 1 • All strings over lowercase letters where the vowels (a,e,i,o,u) occur exactly once, in ascending order Cons → (b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|y|z) Cons* a Cons* e Cons* i Cons* o Cons* u Cons* • All strings of 1s and 0s that do not contain three 0s in a row:More Regular Expressions • All strings of 1s and 0s ending in a 1 ( 0 | 1 )* 1 • All strings over lowercase letters where the vowels (a,e,i,o,u) occur exactly once, in ascending order Cons → (b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|y|z) Cons* a Cons* e Cons* i Cons* o Cons* u Cons* • All strings of 1s and 0s that do not contain three 0s in a row: ( 1* ( ε |01 | 001 ) 1* )* ( ε | 0 | 00 )Non-deterministic Finite Automata Each RE corresponds to a deterministic finite automaton (DFA) • May be hard to directly construct the right DFA What about an RE such as ( a | b )* abb? This is a little different • S1 has two transitions on a This is a non-deterministic finite automaton (NFA) a , b S0 S3 S1 S2 a b bNon-deterministic Finite Automata Each RE corresponds to a deterministic finite automaton (DFA) • May be hard to directly construct the right DFA What about an RE such as ( a | b )* abb? This is a little different • S1 has two transitions on a • S0 has a transition on ε This is a non-deterministic finite automaton (NFA) a , b S0 S1 S4 S2 S3 ε#a b bNondeterministic Finite Automata • An NFA accepts a string x iff ∃ a path though the transition graph from s0 to a final state such that the edge labels spell x • Transitions on ε consume no input • To “run” the NFA, start in s0 and guess the right transition at each choice point with multiple possibilities → Always guess correctly → If some sequence of correct guesses accepts x then accept Why study NFAs? • They are the key to automating the RE→DFA construction • We can paste together NFAs with ε-transitions NFA NFA becomes an NFA εRelationship between NFAs and DFAs DFA is a special case of an NFA • DFA has no ε transitions • DFA’s transition function is single-valued • Same rules will work DFA can be simulated with an NFA → Obviously NFA can be simulated with a DFA (less obvious) • Simulate sets of possible states • Possible exponential blowup in the state space • Still, one state per character in the input streamAutomating Scanner Construction To convert a specification into code: 1 Write down the RE for the input language 2 Build a big NFA 3 Build the DFA that simulates the NFA 4 Systematically shrink the DFA 5 Turn it into code Scanner generators • Lex, Flex, and JLex work along these lines • Algorithms are well-known and well-understood • Key issue is interface to parser (define all parts of speech)Automating Scanner Construction RE→ NFA (Thompson’s construction) • Build an NFA for each term • Combine them with ε-transitions NFA → DFA (subset construction) • Build the simulation DFA → Minimal DFA • Hopcroft’s algorithm DFA →RE (Not part of the scanner construction) • All pairs, all paths problem • Take the union of all paths from s0 to an accepting state minimal DFA RE NFA DFA The Cycle of ConstructionsRE →NFA using Thompson’s Construction Key idea • NFA pattern for each symbol & each operator • Join them with ε transitions in precedence order S0 S1 aNFA for a S0 S1 aS3 S4 bNFA for ab εNFA for a | b S0 S1 S2 aS3 S4 bS5 εε εεS0 S1 ε#S3 S4 ε#NFA for a* aε#ε#Ken Thompson, CACM, 1968 S0 S1 bNFA for b Concatenation Alternation ClosureExample of Thompson’s Construction Let’s try a ( b | c )* 1. a, b, & c 2. b | c 3. ( b | c )* S0 S1 aS0 S1 bS0 S1 cS2 S3 bS4 S5 cS1 S6 S0 S7 ε#ε#ε# ε#ε# ε#ε# ε#S1 S2 bS3 S4 cS0 S5 ε#ε#ε#ε#Example of Thompson’s Construction (cont'd) 4. a ( b | c )* Of course, a human would design something simpler ... S0 S1 ab | c But, we can automate production of the more complex one ... S0 S1 aε#S4 S5 bS6 S7 cS3 S8 S2 S9 ε#ε#ε# ε#ε# ε#ε# ε#NFA →DFA with Subset Construction Need to build a simulation of the NFA Two key functions • Delta(qi , a) is set of states reachable from each state in qi by a → Returns a set of states, for each n ∈ qi of δi (n, a) • ε-closure(si) is set of states reachable from si by ε transitions The algorithm: • Start state derived from n0 of the NFA • Take its ε-closure q0 = ε-closure(n0) • Compute Delta(q, α) for each α ∈ Σ, and take its ε-closure • Iterate until no more states are added Sounds more complex than it is…NFA →DFA with Subset Construction The algorithm: q0 ← ε-closure(n0 ) Q ← {q0} WorkList ← {q0} while ( WorkList ≠ ф ) remove q from WorkList for each α ∈ Σ# t← ε-closure(Delta(q,α)) T[q,α] ← t if ( t ∉ Q ) then add t to Q and WorkList Let’s think about why this works The algorithm halts: 1. Q contains no duplicates (test before adding) 2. 2Q is finite 3. while loop adds to Q, but does not remove from Q (monotone) ⇒ the loop halts Q contains all the reachable NFA states It tries each character in each q. It builds every possible NFA configuration. ⇒ Q and T form the DFANFA →DFA with Subset Construction Example of a fixed-point computation • Monotone construction of some finite set • Halts when it stops

View Full Document