Inference of Reversible Languages DA N A A NG L U I N Yale Untverstty, New Haven, Connecttcut Abstract. A famdy of efficient algorithms for referring certain subclasses of the regular languages from fmtte posttwe samples is presented These subclasses are the k-reversible languages, for k = 0, 1, 2, .. .. For each k there is an algorithm for finding the smallest k-reversible language containing any fimte posluve sample. It ts shown how to use this algorithm to do correct identification m the ILmlt of the k- reversible languages from posmve data A reversible language is one that Is k-reverstble for some k __ 0. An efficient algonthrn is presented for mfernng reversible languages from posmve and negative examples, and it is shown that it leads to correct identification m the hmlt of the class of reversible languages. Numerous examples are gtven to dlustrate the algorithms and their behawor Categories and Subject Descriptors F 1 1 [Computation by Abstract Devices] Models of ComputaUon-- automata, F 4 3 [Mathematical Logic and Formal Languages] Formal Languages--classes defined by grammars or automata; 1 2 6 [Artificial Intelligence] Learnmg--mductlon, 1.5 1 [Pattern Recognition] Models--structural General Terms' Algorithms, Theory Addmonal Key Words and Phrases Inductive inference, grammatical mference, reversible languages 1. In trodu c tion Th is p ap er c o n c e r ns the p ro b le m o f in d u ct iv el y in f e r r i n g g e n er al rule s from e x a mpl e s . Peop l e seem to agre e that this is a n i mp o rt a nt pr ob le m area, bu t ther e is m uc h less ag r eem e nt a b ou t h o w to study it. One o f the p r i m a r y g oals o f this p a p e r is to i n d ic a t e that a re l a t i v e l y th e o r e t ic a l a p p ro ach , using tools f r o m c om p le xi t y t h eor y a nd f o rm a l la n gu ag e theo r y , a pp e ar s to be b o th feasible an d fruitful. W h e n peop l e c o mm u ni ca t e c omp l ex p r o ce du re s to ot h er pe o p l e , t he y of t e n se e m to re l y o n a s o mew h at s k e t chy d es c ri p ti o n o f the gen e ra l ideas, t o ge t h er wit h e xa mpl es to elucid a t e p a rti c ula r details. I f we c o u ld f red s o u nd, uniform, an d co nv en ie nt m e th od s th at wo u ld al l o w c om pu t er p r og r am s to general i z e a pp r o p ri at e ly fro m exampl e s , these wo u ld p ro ba bl y incre a se th e us a bi l it y o f c omp u ter s by experts an d no n exp e rts both. This p a p er is pa rt o f a gen e r al s t u d y o f wh a t m ak es a class o f rules ef f icientl y a nd reli a bl y i n f er a b le fr o m e x a m pl e s , with the g o a l of e v ent u all y fi n d in g such m et h od s. In this pa p e r we s t u d y in f er a b l li t y m a p a rti c ula r a b s tr a c t d om a in , t h at o f inferr i n g fo r mal l a n gua g es from finite sampl e s. I ma g in e that we are g i v e n a finite set o f strings from som e f or ma l l a n g u ag e , a n d p o s si b l y a no th er finite set o f strings fr o m the c o m p l e me nt o f the lan g u a ge , an d we are req u i re d to m a ke a guess o f wh a t the This research was supported by the National Science Foundation under Grant MCS 80-02447 Author's address Department of Computer Science, Yale Umverslty, P O Box 2158, Yale Station, New Haven, CT 06520 Permission to copy without fee all or part of this material ~s granted provided that the coptes are not made or distributed for direct commercial advantage, the ACM copyright notice and the Utle of the publlcaUon and its date appear, and notice Is given that copying is by permission of the Assoclauon for Computing Machinery To copy otherwise, or to republish, requires a fee and/or specific permission © 1982 ACM 0004-5411/82/0700-0741 $00 75 Journal of the Association for Computing Machinery, Vol 29, No 3, July 1982, pp 741-76574 2 DANA ANGLUIN unknown language is. For example, given the set of strings (00111, 01, 000011, 001111}, we might guess that the underlying language is the set of all strings consisting of any number of O's followed by any number of l's. It may be objected that there are infinitely many regular languages that contain this sample, and we have no way of knowing whether such a guess is "right." One of the fundamental problems for this area of study is to find appropriate, natural, and theoretically sound criteria for the "goodness" of generalizations such as this one. The concept of identification in the limit, formulated by Gold [18], has been of basic importance m theoretical studies of inductive inference. This concept relies on looking at the limiting behavior of the inferring process as it is given more and more examples of a particular rule. If, for any sequence that eventually contains each example of a given rule, the inferring process produces a sequence of guesses that eventually converges to one that is a correct description of the underlying rule, then the process is said to identify this rule m the limit. Much has been learned about the classes of formal languages and partial recursive functions that can be correctly identified in the limit by various kinds of effective inference …
View Full Document