Penn COGS 502 - Language Learning from Stochastic Input - D102807

Home> Schools> University of Pennsylvania> Cognitive Science (COGS) > COGS 502> Language Learning from Stochastic Input

Penn COGS 502 - Language Learning from Stochastic Input

Course Cogs 502- Mathematical Foundations for the Study of Language

Pages 8

Download Save

Unformatted text preview:

Language Learning from Stochastic InputShyam Kapur Gianfranco BilardiInstitute for Research in Cognitive ScienceDipartimento di Elettronica ed InformaticaUniversity of PennsylvaniaUniversit5 di Padova3401 Walnut Street-Rm 412CVia Gradenigo 61APhiladelphia PA 1910435131 Padova [email protected] .upenn.edu [email protected] learning from positive data in theGold model of inductive inference is investi-gated in a setting where the data can be mod-eled as a stochastic process. Specifically, theinput strings are assumed to form a sequenceof identically distributed, independent randomvariables, where the distribution depends onthe language being presented. A scheme isdeveloped which can be tuned to learn, withprobability one, any family of recursive lan-guages, given a recursive enumeration of totalindices for the languages in the family and aprocedure to compute a lower bound to theprobability of occurrence of a given string in agiven language. Variations of the scheme workunder other assumptions, e.g., if the prob-abilities of the strings form a monotone se-quence with respect to a given enumeration.The learning algorithm is rather simple andappears psychologically plausible. A more so-phisticated version of the learner is also devel-oped, based on a probabilistic version of thenotion of tell-tale subset. This version yields,as a special case, Angluin’s learner for the fam-ilies of languages that are learnable from alltexts (and not just from a set of texts of prob-ability one).1 IntroductionIn the Gold paradigm for inductive inference [G0167],the learner is presented with the ted of a language (allstrings in any order with possible repetitions) that be-longs to a specified family of languages. This model ismotivated by the well-established hypothesis that thePermission to copy without fee all or part of this material isgranted provided that the copias are not made or distributedfordirect commercial advantage,the ACM copyright notice and thetitle of the publication and its date appear, and notice is giventhat copying is by permission of the Association for ComputingMachinery. To copy otherwise, or to republish, requires a feeand/or specific permission.COLT’92-71921PA, USA01992 ACM 0-89791 -498 -81921000710303 ...$ 1.50child learns her native language from positive evidencealone. (For a discussion,see [Ber85].) The learneris said to learn a language if, on any text for it, thelearner’s guess converges to the same language, i.e.,from some point onwards, the guess coincides with thelanguage being presented. The learner is said to learnthe family if it learns each language in the family.Angluin [Ang80] characterized the families learnable inthe Gold paradigm. The requirement of convergenceon evey text of each language turns out to be toostringent. Gold [G0167] suggested that by imposingprobabilistic assumptions on texts for a language, andrequesting convergence only with probability one, theclass of identifiable families could be enriched. Stochas-tic input could provide some form ofindirect nega-tive evidence of the type that has often been suggestedfor natural language acquisition [Pin84, Cla90]. An-gluin [Ang88] studied the case of a stochastic inputwhere the distribution of each language is essentiallyknown to the learner in the form of a procedure that al-lows to compute it. It is shown that families not learn-able from all texts become learnable with probability 1.Furthermore, the distribution itself is learned, not justthe supporting language.1In this work, we study the learning problem in the caseof stochastic input, under relatively mild assumptionson the input distribution (e.g., a lower bound to the dis-tribution is computable, or the distribution is monotonewith respect to a canonical enumeration of the strings).The target is the identification in the limit, with prob-ability one, of the language being presented, not of thedistribution according to which it is presented. Indeed,the distribution need neither be computable nor evenbe representable in any finite form.We develop a learning algorithm computationally sim-ple and psychologically plausible. (For some applica-tions to natural language acquisition, see [Kap92].) Nocomplicated functions are computed or distributions es-timated. Learning proceeds as if each language is inisolation, and, at any stage, only the guessed languageand any parameters associated with it play any role.‘ Angluin also showed that her work subsumes the previ-ous work. (In particular, [Hor69] and [vdMW78].)303In contrast, many learning algorithms proposed in theliterature continue evaluating various functions of lan-guages different from the one being presented, even afterthey have converged.In Section 2, we define the basic framework of ourmodel. In Section 3, we define arecognition problemwhich is the problem to recognize whether the languagebeing presented is the same as a given language of thefamily. We show a systematic way to obtain a learnerfrom a recognize. Since the recognize is simpler to de-fine and to analyze than the learner, this approach is ofindependent interest.In Section 4, we specify a particular recognize and an-alyze its behavior. Our recognize can be tuned to workcorrectly whenever a lower bound is computable for theprobability that a given string occur in the input whena given language is being presented. We also discussvariations of the scheme which work in other situations.Our recognize makes no assumptions on—and takes noadvantage of—the structure of the family. Indeed, un-der the proper probabilistic assumptions, it will workfor any family, but without such assumptions it couldfail even on a family that is learnable from all texts.This is not surprising if one considers that, as shownby Angluin [Ang80], learning a family from all texts isequivalent to the ability of recursively enumerating aso-called tell-tale subset for each language in the family.At the same time, a tell-tale subset enumerator is not ingeneral computable from the description of the family(even if there is one) [KB91, Kap91]. Motivated by thepreceding considerations, in Section 5, we show how therecognition algorithm can be generalized so that learn-ing from all texts arises aa a special case within thissetting. We conclude with the hope that our develop-ment could provide useful hints for understanding therole played by indirect negative evidence in the learningprocess.2 ModelLet E be a finite alphabet and Z* be the set

View Full Document


School:
Email:
New Password:
Confirm Password:

Penn COGS 502 - Language Learning from Stochastic Input

Sign up for free to view:

Please select your school