A Bayesian view of language evolution by iterated learning

Home> Academic Documents> A Bayesian view of language evolution by iterated learning

DOC PREVIEW

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

A Bayesian view of language evolution by iterated learningThomas L. Griffiths ([email protected])Department of Cognitive and Linguistic Sciences, Brown University, Providence, RI 02912Michael L. Kalish ([email protected])Institute of Cognitive Science, University of Louisiana at Lafayette, Lafayette, LA 70504AbstractModels of language evolution have demonstrated howasp ects of human language, such as compositionality,can arise in populations of interacting agents. This pa-p er analyzes how languages change as the result of aparticular form of interaction: agents learning from oneanother. We show that, when the learners are rationalBayesian agents, this pr ocess of iterated learning con-verges to the prior distribution over languages assumedby those learners. The rate of convergence is set bythe amount of information conveyed by the data seenby each generation; the less informative the data, thefaster the process converges to the prior.Human languages form a subset of all logically pos-sible communication schemes, with universal propertiesshared by all languages (Comrie, 1981; Greenberg, 1963;Hawkins, 1988). A traditional explanation for these lin-guistic universals is that they are the consequence ofconstraints on the set of learnable languages imposed byan innate, language-specific, genetic endowment (e.g.,Chomsky, 1965). Recent research has explored an alter-native explanation: that universals emerge from evolu-tionary processes produced by the transmission of lan-guages across generations (e.g., Kirby, 2001; Nowak,Plotkin, & Jansen, 2000). Languages change as each gen-eration learns from that which preceded it. This processof iterated learning implicitly selects for languages thatare more learnable. This suggests a tantalizing hypoth-esis: that iterated learning might be sufficient to explainthe emergence of linguistic universals (Briscoe, 2002).Kirby (2001) introduced a framework for exploringthis hypothesis, called the iterated learning model (ILM).In the ILM, each generation consists of one or morelearners. Each learner sees some data, forms a hypothe-sis about the process that produced that data, and thenproduces the data which will be supplied to the nextgeneration of learners, as shown in Figure 1 (a). Thelanguages that succeed in being transmitted across gen-erations are those that pass through the “informationbottleneck” imposed by iterated learning. If particularproperties of languages make it easier to pass throughthat bottleneck, then many generations of iterated learn-ing might allow those properties to become universal.The ILM can be us ed to explore how different as-sumptions about language learning influence languageevolution. A variety of learning algorithms have beenexamined using the ILM, including a heuristic gram-mar inducer (Kirby, 2001), associative networks (Smith,Kirby, & Brighton, 2003), and minimum descriptionlength (Brighton, 2002). Iterated learning with thesealgorithms produces languages that possess one of themost compelling properties of human languages: compo-sitionality. In a compositional language, the meaning ofan utterance is a function of the meaning of its parts.The intuitive explanation for these results is that theregular structure of compositional languages means thatthey can be learned from less data, and are thus morelikely to pass through the information bottleneck.These instances of compositionality emerging from it-erated learning raise an important question: what lan-guages will survive many generations of iterated learn-ing? While the circumstances under which composi-tionality will emerge from iterated learning with specificlearning algorithms have been investigated (Brighton,2002; Smith et al., 2003), there are no general resultsfor arbitrary properties of languages or broad classesof learning algorithms. In this paper, we analyze iter-ated learning for the case where the learners are rationalBayesian agents. A variety of learning algorithms can beformulated in terms of Bayesian inference, and Bayesianmethods underlie many approaches in computational lin-guistics (Manning & Sch¨utze, 1999). The assumptionthat the learners are Bayesian agents makes it possible toderive analytic results indicating which languages will befavored by iterated learning. In particular, we prove thesurprising res ult that the probability distribution overlanguages resulting from iterated Bayesian learning con-verges to the prior probability distribution assumed bythe learners. This implies that the asymptotic probabil-ity that a language is used does not dep end at all uponthe properties of the language, b eing determined entirelyby the assumptions of the learner.hypothesis data hypothesis data(a)generationlearninggenerationlearninglearning· · ·datax0x1x2y0h1y1h2y2· · ·(b)Figure 1: (a) Iterated learning. (b) Dependencies amongvariables in iterated iterated Bayesian learning.Iterated Bayesian learningFollowing most of the work applying iterated learningto language evolution, we will assume that our learnersare faced with a function learning task: given a set ofm inputs, x = {x1, . . . , xm}, and m corresponding out-puts, y = {y1, . . . , ym}, the learner has to estimate theprobability distribution over y for each x. In a languagelearning setting, x is usually taken to be a set of “mean-ings” or events in the world, and y is taken to be the setof utterances associated with those events. We will useX and Y to denote the set of values that x and y cantake on.Iterated learning begins with some initial data,(x0, y0), pr esented to the first learner, who then gener-ates outputs y1in response to some new inputs x1. Thesecond learner sees (x1, y1), and generates y2in responseto x2. This process continues for each successive gener-ation, with learner n + 1 seeing (xn, yn) and generatingyn+1in response to xn+1. The r esult of this processdepends upon the algorithm used by the learners.We will assume that our learners are Bayesian agents,supplied with a finite discrete1hypothesis space H anda prior probability distribution p(h) for each hypothe-sis h ∈ H. In a function learning task, each hypothesish corresponds to a conditional probability distributionp(y|x, h), specifing the distribution over all sets of out-puts for any set of inputs. In the learning step of the pro-cess illustrated in Figure 1 (a), learner n+1 sees (xn, yn),and computes a posterior distribution over hn+1usingBayes’ rulep(hn+1|xn, yn) =p(yn|xn,


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

Please select your school