DOC PREVIEW
UD CISC 672 - Lexical Analysis- DFA Minimization & Wrap Up

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Lexical Analysis:DFA Minimization & Wrap UpAutomating Scanner ConstructionPREVIOUSLYRENFA (Thompson’s construction) •Build an NFA for each term•Combine them with -movesNFA DFA (subset construction) • Build the simulationTODAYDFA  Minimal DFA• Hopcroft’s algorithm DFA  RE (not really part of scanner construction)• All pairs, all paths problem•Union together paths from s0 to a final stateminimal DFARE NFA DFAThe Cycle of ConstructionsDFA MinimizationDetails of the algorithm• Group states into maximal size sets, optimistically•Iteratively subdivide those sets, as needed • States that remain grouped together are equivalentInitial partition, P0 , has two sets: {DF} & {D-DF}(DFA =(Q,,,q0,F)) Splitting a set (“partitioning a set by a”)•Assume qi, & qj  s, and  (qi,a) = qx, & (qj,a) = qy •If qx & qy are not in the same set, then s must be splitqi has transition on a, qj does not  a splits s • One state in the final DFA cannot have two transitions on aDFA MinimizationThe algorithmP  { DF, {D-DF}}while ( P is still changing) T  Ø for each set p  PT  T Split(p) P  TSplit(S) for each    if splits S into s1 and s2 then return { s1,s2} return SWhy does this work?• Partition P  2D• Starts with 2 subsets of D {DF} and {D-DF}•While loop takes PiPi+1 by splitting 1 or more sets•Pi+1 is at least one step closer to the partition with |D| sets•Maximum of |D| splitsNote that•Partitions are never combined• Initial partition ensures that final states are intactThis is a fixed-point algorithm!Key Idea: Splitting S around  STRThe algorithm partitions S around Original set SQS has transitions on  to R, Q, & TKey Idea: Splitting S around  TROriginal set SQS1S2Could we split S2 further?Yes, but it does not help asymptoticallyS2 is everything in S - S1DFA MinimizationWhat about a ( b | c )* ?First, the subset construction:q0 q1 aq4 q5 bq6 q7 cq3q8 q2 q9     -closure(Delta(s,*)) NFA states a b c s0 q0 q1, q2, q3, q4, q6, q9 none none s1 q1, q2, q3, q4, q6, q9 none q5, q8, q9, q3, q4, q6 q7, q8, q9, q3, q4, q6 s2 q5, q8, q9, q3, q4, q6 none s2 s3 s3 q7, q8, q9, q3, q4, q6 none s2 s3 s3 s2 s0 s1 cbabbccFinal statesDFA MinimizationThen, apply the minimization algorithmTo produce the minimal DFAs3 s2 s0 s1 cbabbccSplit onCurr ent Partition a b cP0{ s1, s2, s3} {s0} none none nones0 s1 ab | cIn lecture 4, we observed that a human would design a simpler automaton than Thompson’s construction & the subset construction did.Minimizing that DFA produces the one that a human would design! final statesAbbreviated Register SpecificationStart with a regular expressionr0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | r8 | r9minimal DFARE NFA DFAThe Cycle of ConstructionsAbbreviated Register SpecificationThompson’s construction producesr0r1r2r8r9… …s0sf…minimal DFARE NFA DFAThe Cycle of ConstructionsTo make it fit, we’ve eliminated the -transition between “r” and “0...9”.Abbreviated Register SpecificationThe subset construction buildsThis is a DFA, but it has a lot of states …r0sf0s0sf11sf22sf9sf8…98minimal DFARE NFA DFAThe Cycle of ConstructionsAbbreviated Register SpecificationThe DFA minimization algorithm buildsThis looks like what a skilled compiler writer would do!rs0sf0,1,2,3,4,5,6,7,8,9minimal DFARE NFA DFAThe Cycle of ConstructionsLimits of Regular LanguagesAdvantages of Regular Expressions•Simple & powerful notation for specifying patterns• Automatic construction of fast recognizers• Many kinds of syntax can be specified with REsExample — an expression grammarTerm  [a-zA-Z] ([a-zA-z] | [0-9])*Op  + | - |  | /Expr  ( Term Op )* TermOf course, this would generate a DFA …If REs are so useful …Why not use them for everything?Limits of Regular LanguagesNot all languages are regularRL’s  CFL’s  CSL’sYou cannot construct DFA’s to recognize these languages• L = { pkqk } (parenthesis languages)•L = { wcw r | w  *}Neither of these is a regular language (nor an RE)But, this is a little subtle. You can construct DFA’s for• Strings with alternating 0’s and 1’s (  | 1 ) ( 01 )* (  | 0 ) • Strings with an even number of 0’s and 1’s RE’s can count bounded sets and bounded differencesWhat can be so hard?Poor language design can complicate scanning• Reserved words are importantif then then then = else; else else = then (PL/I)•Insignificant blanks (Fortran & Algol68)do 10 i = 1,25do 10 i = 1.25• String constants with special characters (C, C++, Java, …)newline, tab, quote, comment delimiters, …•Finite closures (Fortran 66 & Basic)Limited identifier lengthAdds states to count lengthBuilding Faster Scanners from the DFATable-driven recognizers waste effort• Read (& classify) the next character•Find the next state • Assign to the state variable • Trip through case logic in action() • Branch back to the topWe can do better• Encode state & actions in the code • Do transition tests locally• Generate ugly, spaghetti-like code•Takes (many) fewer operations per input characterchar  next character;state  s0 ;call action(state,char);while (char  eof) state  (state,char); call action(state,char); char  next character;if (state) = final then report acceptance;else report failure;Building Faster Scanners from the DFAA direct-coded recognizer for r Digit Digit*•Many fewer operations per character•Almost no memory operations• Even faster with careful use of fall-through cases goto s0;s0: word  Ø; char  next character; if (char = ‘r’) then goto s1; else goto se;s1: word  word + char; char  next character; if (‘0’ ≤ char ≤ ‘9’) then goto s2; else goto se;s2: word  word + char; char  next character; if (‘0’ ≤ char ≤ ‘9’) then goto s2; else if (char = eof) then report success; else goto se;se: print error message; return failure;Building Faster ScannersHashing keywords versus encoding them directly•Some (well-known) compilers recognize


View Full Document

UD CISC 672 - Lexical Analysis- DFA Minimization & Wrap Up

Documents in this Course
Syllabus

Syllabus

18 pages

Load more
Download Lexical Analysis- DFA Minimization & Wrap Up
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lexical Analysis- DFA Minimization & Wrap Up and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lexical Analysis- DFA Minimization & Wrap Up 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?