U.C. Berkeley — CS172: Automata, Computability and Complexity Handout 1Professor Luca Trevisan 9/10/2009Notes on State MinimizationThese notes present a technique to prove a lower bound on the number of states of any DFAthat recognizes a given language. The technique can also be used to prove that a language is notregular. (By showing that for every k one needs at least k states to recognize the language.) Wealso present an efficient algorithm to convert a given DFA into a DFA for the same language andwith a minimum number of states.1 Distinguishable and Indistinguishable StatesIt will be helpful to keep in mind the following two languages over the alphabet Σ = {0, 1} asexamples: the language EQ = {0n1n|n ≥ 1} of strings containing a sequence of zeroes followed byan equally long sequence of ones, and the language A = (0 ∪ 1)∗· 1 · (0 ∪ 1) of strings containing a1 in the second-to-last position.We start with the following basic notion.Definition 1 (Distinguishable Strings) Let L be a language over an alphabet Σ. We say thattwo strings x and y are distinguishable with respect to L if there is a string z such that xz ∈ Land yz 6∈ L, or vice versa.For example the strings x = 0 and y = 00 are distinguishable with respect to EQ, because ifwe take z = 1 we have xz = 01 ∈ EQ and yz = 001 6∈ EQ. Also, the strings x = 00 and y=01 aredistinguishable with respect to A as can be seen by taking z = 0.On the other hand, the strings x = 0110 and y = 10 are not distinguishable with respect to EQbecause for every z we have xz 6∈ EQ and yz 6∈ EQ.Exercise 2 Find two strings that are not distinguishable with respect to A.The intuition behind Definition 1 is captured by the following simple fact.Lemma 3 Let L be a language, M be a DFA that decides L, and x and y be distinguishable stringswith respect to L. Then the state reached by M on input x is different from the state reached by Mon input y.Proof: Suppose by contradiction that M reaches the same state q on input x and on input y. Letz be the string such that xz ∈ L and yz 6∈ L (or vice versa). Let us call q0the state reached by Mon input xz. Note that q0is the state reached by M starting from q and given the string z. Butalso, on input yz, M must reach the same state q0, because M reaches state q given y, and thengoes from q to q0given z. This means that M either accepts both xz and yz, or it rejects both. Ineither case, M is incorrect and we reach a contradiction. Consider now the following generalization of the notion of distinguishability.Definition 4 (Distinguishable Set of Strings) Let L be a language. A set of strings {x1, . . . , xk}is distinguishable if for every two distinct strings xi, xjwe have that xiis distinguishable from xj.1For example one can verify that {0, 00, 000} are distinguishable with respect to EQ and that{00, 01, 10, 11} are distinguishable with respect to A.We now prove the main result of this section.Lemma 5 Let L be a language, and suppose there is a set of k distinguishable strings with respectto L. Then every DFA for L has at least k states.Proof: If L is not regular, then there is no DFA for L, much less a DFA with less than k states.If L is regular, let M be a DFA for L, let x1, . . . , xkbe the distinguishable strings, and let qibe thestate reached by M after reading xi. For every i 6= j, we have that xiand xjare distinguishable,and so qi6= qjbecause of Lemma 3. So we have k different states q1, . . . , qkin M, and so M has atleast k states. Using Lemma 5 and the fact that the strings {00, 01, 10, 11} are distinguishable with respect toA we conclude that every DFA for A has at least 4 states.For every k ≥ 1, consider the set {0, 00, . . . , 0k} of strings made of k or fewer zeroes. It is easyto see that this is a set of distinguishable strings with respect to EQ. This means that there cannotbe a DFA for EQ, because, if there were one, it would have to have at least k states for every k,which is clearly impossible.2 State MinimizationLet L be a language over an alphabet Σ. We have seen in the previous section the definition ofdistinguishable strings with respect to L. We say that two strings x and y are indistinguishable,and we write it x ≈Ly if they are not distinguishable. That is, x ≈Ly means that, for every stringz, the string xz belongs to L if and only if the string yz does. By definition, x ≈Ly if and only ify ≈Lx, and we always have x ≈Lx. It is also easy to see that if x ≈Ly and y ≈Lw then we musthave x ≈Lw. In other words:Fact 6 The relation ≈Lis an equivalence relation over the strings in Σ∗.As you may remember from such classes as Math55 or CS70, when you define an equivalencerelation over a set you also define a way to partition the set into a collection of subsets, calledequivalence class. An equivalence class in Σ∗with respect to ≈Lis a set of strings that are allindistinguishable from one another, and that are all distinguishable from all the others not in theset. We denote by [x] the equivalence class that contains the string x.A fancy way of stating Lemma 5 is to say that every DFA for L must have at least as many statesas the number of equivalence class in Σ∗with respect to ≈L. Perhaps surprisingly, the converse isalso true: there is always a DFA that has precisely as many states as the number of equivalenceclasses.Theorem 7 (Myhill-Nerode) Let L be a language over Σ. If Σ∗has infinitely many equivalenceclasses with respect to ≈L, then L is not regular. Otherwise, L can be decided by a DFA whosenumber of states is equal to the number of equivalence classes in Σ∗with respect to ≈L.Proof: If there are infinitely many equivalence classes, then it follows from Lemma 5 that no DFAcan decide L, and so L is not regular.2Suppose then that there is a finite number of equivalence class. We define an automaton thathas a state for each equivalence class. The start state is the class [] and every state of the form[x] for x ∈ L is a final state.It remains to describe the transition function. From a state [x], reading the character a, theautomaton moves to state [xa]. We need to make sure that this definition makes sense. If x ≈Lx0,then the state [x] and the state [x0] are the same, so we need to verify that the state [xa] and thestate [x0a] are also the same. That is, we need to verify that, for every string z, xaz ∈ L if andonly if x0az ∈ L; this is clearly true because x and x0are indistinguishable …
View Full Document