Unformatted text preview:

U.C. Berkeley — CS172: Automata, Computability and Complexity Handout 1Professor Luca Trevisan 9/10/2009Notes on State MinimizationThese notes present a technique to prove a lower bound on the number of states of any DFAthat recognizes a given language. The technique can also be used to prove that a language is notregular. (By showing that for every k one needs at least k states to recognize the language.) Wealso present an efficient algorithm to convert a given DFA into a DFA for the same language andwith a minimum number of states.1 Distinguishable and Indistinguishable StatesIt will be helpful to keep in mind the following two languages over the alphabet Σ = {0, 1} asexamples: the language EQ = {0n1n|n ≥ 1} of strings containing a sequence of zeroes followed byan equally long sequence of ones, and the language A = (0 ∪ 1)∗· 1 · (0 ∪ 1) of strings containing a1 in the second-to-last position.We start with the following basic notion.Definition 1 (Distinguishable Strings) Let L be a language over an alphabet Σ. We say thattwo strings x and y are distinguishable with respect to L if there is a string z such that xz ∈ Land yz 6∈ L, or vice versa.For example the strings x = 0 and y = 00 are distinguishable with respect to EQ, because ifwe take z = 1 we have xz = 01 ∈ EQ and yz = 001 6∈ EQ. Also, the strings x = 00 and y=01 aredistinguishable with respect to A as can be seen by taking z = 0.On the other hand, the strings x = 0110 and y = 10 are not distinguishable with respect to EQbecause for every z we have xz 6∈ EQ and yz 6∈ EQ.Exercise 2 Find two strings that are not distinguishable with respect to A.The intuition behind Definition 1 is captured by the following simple fact.Lemma 3 Let L be a language, M be a DFA that decides L, and x and y be distinguishable stringswith respect to L. Then the state reached by M on input x is different from the state reached by Mon input y.Proof: Suppose by contradiction that M reaches the same state q on input x and on input y. Letz be the string such that xz ∈ L and yz 6∈ L (or vice versa). Let us call q0the state reached by Mon input xz. Note that q0is the state reached by M starting from q and given the string z. Butalso, on input yz, M must reach the same state q0, because M reaches state q given y, and thengoes from q to q0given z. This means that M either accepts both xz and yz, or it rejects both. Ineither case, M is incorrect and we reach a contradiction. Consider now the following generalization of the notion of distinguishability.Definition 4 (Distinguishable Set of Strings) Let L be a language. A set of strings {x1, . . . , xk}is distinguishable if for every two distinct strings xi, xjwe have that xiis distinguishable from xj.1For example one can verify that {0, 00, 000} are distinguishable with respect to EQ and that{00, 01, 10, 11} are distinguishable with respect to A.We now prove the main result of this section.Lemma 5 Let L be a language, and suppose there is a set of k distinguishable strings with respectto L. Then every DFA for L has at least k states.Proof: If L is not regular, then there is no DFA for L, much less a DFA with less than k states.If L is regular, let M be a DFA for L, let x1, . . . , xkbe the distinguishable strings, and let qibe thestate reached by M after reading xi. For every i 6= j, we have that xiand xjare distinguishable,and so qi6= qjbecause of Lemma 3. So we have k different states q1, . . . , qkin M, and so M has atleast k states. Using Lemma 5 and the fact that the strings {00, 01, 10, 11} are distinguishable with respect toA we conclude that every DFA for A has at least 4 states.For every k ≥ 1, consider the set {0, 00, . . . , 0k} of strings made of k or fewer zeroes. It is easyto see that this is a set of distinguishable strings with respect to EQ. This means that there cannotbe a DFA for EQ, because, if there were one, it would have to have at least k states for every k,which is clearly impossible.2 State MinimizationLet L be a language over an alphabet Σ. We have seen in the previous section the definition ofdistinguishable strings with respect to L. We say that two strings x and y are indistinguishable,and we write it x ≈Ly if they are not distinguishable. That is, x ≈Ly means that, for every stringz, the string xz belongs to L if and only if the string yz does. By definition, x ≈Ly if and only ify ≈Lx, and we always have x ≈Lx. It is also easy to see that if x ≈Ly and y ≈Lw then we musthave x ≈Lw. In other words:Fact 6 The relation ≈Lis an equivalence relation over the strings in Σ∗.As you may remember from such classes as Math55 or CS70, when you define an equivalencerelation over a set you also define a way to partition the set into a collection of subsets, calledequivalence class. An equivalence class in Σ∗with respect to ≈Lis a set of strings that are allindistinguishable from one another, and that are all distinguishable from all the others not in theset. We denote by [x] the equivalence class that contains the string x.A fancy way of stating Lemma 5 is to say that every DFA for L must have at least as many statesas the number of equivalence class in Σ∗with respect to ≈L. Perhaps surprisingly, the converse isalso true: there is always a DFA that has precisely as many states as the number of equivalenceclasses.Theorem 7 (Myhill-Nerode) Let L be a language over Σ. If Σ∗has infinitely many equivalenceclasses with respect to ≈L, then L is not regular. Otherwise, L can be decided by a DFA whosenumber of states is equal to the number of equivalence classes in Σ∗with respect to ≈L.Proof: If there are infinitely many equivalence classes, then it follows from Lemma 5 that no DFAcan decide L, and so L is not regular.2Suppose then that there is a finite number of equivalence class. We define an automaton thathas a state for each equivalence class. The start state is the class [] and every state of the form[x] for x ∈ L is a final state.It remains to describe the transition function. From a state [x], reading the character a, theautomaton moves to state [xa]. We need to make sure that this definition makes sense. If x ≈Lx0,then the state [x] and the state [x0] are the same, so we need to verify that the state [xa] and thestate [x0a] are also the same. That is, we need to verify that, for every string z, xaz ∈ L if andonly if x0az ∈ L; this is clearly true because x and x0are indistinguishable …


View Full Document

Berkeley COMPSCI 172 - Notes on State Minimization

Download Notes on State Minimization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Notes on State Minimization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Notes on State Minimization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?