115-251Great Theoretical Ideas in Computer ScienceDeterministic Finite AutomataLecture 20 (October 29, 2009)A machine so simple that you can understand it in less than one minute00,1001110111 111111The machine accepts a string if the process ends in a double circle00,100111The machine accepts a string if the process ends in a double circleAnatomy of a Deterministic Finite Automatonstatesstatesq0q1q2q3start state (q0)accept states (F)Anatomy of a Deterministic Finite Automaton00,100111q0q1q2q3The alphabet of a finite automaton is the set where the symbols come from:The language of a finite automaton is the set of strings that it accepts{0,1}20,1q0L(M) =All strings of 0s and 1s∅The Language of Machine Mq0q10011L(M) ={ w | w has an even number of 1s}An alphabet Σ is a finite set (e.g., Σ = {0,1})A string over Σ is a finite-length sequence of elements of ΣFor x a string, |x| is the length of xThe unique string of length 0 will be denoted by ε and will be called the empty or null stringNotationA language over Σ is a set of strings over ΣQ is the set of statesΣ is the alphabetδ : Q × Σ → Q is the transition functionq0∈ Q is the start stateF ⊆ Q is the set of accept statesA finite automaton is a 5-tuple M = (Q, Σ, δ, q0, F)L(M) = the language of machine M= set of all strings machine M acceptsQ = {q0, q1, q2, q3}Σ = {0,1}δδδδ : Q ×××× Σ → Q transition function*q0∈∈∈∈ Q is start stateF = {q1, q2} ⊆⊆⊆⊆ Q accept statesM = (Q, Σ, δδδδ, q0, F) whereδδδδ 0 1q0q0q1q1q2q2q2q3q2q3q0q2*q200,100111q0q1q3Mq q00101q0q0010010,1Build an automaton that accepts all and only those strings that contain 0013Build an automaton that accepts all strings whose length is divisible by 2 but not 3Build an automaton that accepts exactly the strings that contain 01011 as a substring?How about an automaton that accepts exactly the strings that contain an even number of 01 pairs?A language is regular if it is recognized by a deterministic finite automatonL = { w | w contains 001} is regularL = { w | w has an even number of 1s} is regularUnion TheoremGiven two languages, L1and L2, define the union of L1and L2as L1∪ L2= { w | w ∈ L1or w ∈ L2} Theorem: The union of two regular languages is also a regular languageTheorem: The union of two regular languages is also a regular languageProof Sketch: Let M1= (Q1, Σ, δ1, q0, F1) be finite automaton for L1and M2= (Q2, Σ, δ2, q0, F2) be finite automaton for L2We want to construct a finite automaton M = (Q, Σ, δ, q0, F) that recognizes L = L1∪ L212Idea: Run both M1and M2at the same time!Q= pairs of states, one from M1and one from M2= { (q1, q2) | q1∈ Q1and q2∈ Q2}= Q1× Q24Theorem: The union of two regular languages is also a regular languageq0q10011p0p11100q0,p0q1,p011q0,p1q1,p1110000Automaton for Unionq0,p0q1,p011q0,p1q1,p1110000Automaton for IntersectionTheorem: The union of two regular languages is also a regular languageCorollary: Any finite language is regularThe Regular OperationsUnion: A ∪ B = { w | w ∈ A or w ∈ B } Intersection: A ∩ B = { w | w ∈ A and w ∈ B } Negation: ¬A = { w | w ∉ A } Reverse: AR= { w1 …wk| wk …w1∈ A }Concatenation: A ⋅ B = { vw | v ∈ A and w ∈ B }Star: A* = { w1 …wk| k ≥ 0 and each wi∈ A }Regular Languages Are Closed Under The Regular OperationsWe have seen part of the proof for Union. The proof for intersection is very similar. The proof for negation is easy.5Input: Text T of length t, string S of length nThe “Grep” ProblemProblem: Does string S appear inside text T?a1, a2, a3, a4, a5, …, atNaïve method: Cost: Roughly nt comparisonsAutomata SolutionBuild a machine M that accepts any string with S as a consecutive substringFeed the text to MCost:As luck would have it, the Knuth, Morris, Pratt algorithm builds M quicklyt comparisons + time to build MGrepCoke MachinesThermostats (fridge)ElevatorsTrain Track SwitchesLexical Analyzers for ParsersReal-life Uses of DFAsAre all languages regular?i.e., a bunch of a’s followed by an equal number of b’sConsider the language L = { anbn| n > 0 }No finite automaton accepts this languageCan you prove this?anbnis not regular. No machine has enough states to keep track of the number of a’s it might encounter6That is a fairly weak argument Consider the following example…L = strings where the # of occurrences of the pattern ab is equal to the number of occurrences of the pattern baCan’t be regular. No machine has enough states to keep track of the number of occurrences of abM accepts only the strings with an equal number of ab’s and ba’s!bbabaaababL = strings where the # of occurrences of the pattern ab is equal to the number of occurrences of the pattern baCan’t be regular. No machine has enough states to keep track of the number of occurrences of abLet me show you a professional strength proof that anbnis not regular…This is the kind of proofwe expect from you…Pigeonhole principle:Given n boxes and m > n objects, at least one box must contain more than one objectLetterbox principle:If the average number of letters per box is x, then some box will have at least x letters (similarly, some box has at most x)7Theorem: L= {anbn| n > 0 } is not regularProof (by contradiction):Assume that L is regularThen there exists a machine M with k states that accepts LFor each 0 ≤ i ≤ k, let Sibe the state M is in after reading ai∃i,j ≤ k such that Si= Sj, but i ≠ jM will do the same thing on aibi and ajbi But a valid M must reject ajbiand accept aibiHow to prove a language is not regular…Assume it is regular, hence is accepted bya DFA M with n states.Show that there are two strings s and s’ which both reach some state in M (usually by pigeonhole principle)Then show there is some string t such that string st is in the language, but s’t is not. However, M accepts either both or neither.(most of the time)What are s, s’, t? That’s where the work is…Deterministic Finite Automata• Definition• Testing if they accept a string• Building automataRegular Languages• Definition• Closed Under Union,Intersection, Negation• Using Pigeonhole Principle toshow language ain’t regularHere’s What You Need to
View Full Document