6.863J Natural Language ProcessingLecture 22: Language Learning, Part 2Robert C. Berwick6.863J/9.611J Lecture 22 Sp03The Menu Bar• Administrivia:• Project-p?• Can we beat the Gold standard?• Review of the framework• Various stochastic extensions• Modern learning theory & sample size• Gold results still hold!• Learning by setting parameters: the triggering learning algorithm6.863J/9.611J Lecture 22 Sp03The problem• From finite data, induce infinite set• How is this possible, given limited time & computation?• Children are not told grammar rules• Ans: put constraints on class of possible grammars (or languages)6.863J/9.611J Lecture 22 Sp03To review: the Gold framework• Components:• Target languageLgtor Lt(with target grammar gt), drawn from hypothesis family H• Data (input) sequencesD and texts t; tn• Learning algorithm (mapping) A ; output hypothesis after input tnA (tn)• Distance metric d , hypotheses h• Definition of learnability:d(gt,hn) →n→∞06.863J/9.611J Lecture 22 Sp03Framework for learning1. Target Language Lt∈ L is a target language drawn from a class of possible target languages L.2. Example sentences si∈ Ltare drawn from the target language & presented to learner. 3. Hypothesis Languagesh ∈H drawn from a class of possible hypothesis languages that child learners construct on the basis of exposure to the example sentences in the environment4. Learning algorithm A is a computable procedure by which languages from H are selected given the examples6.863J/9.611J Lecture 22 Sp03Some details• Languages/grammars – alphabet Σ∗• Example sentences• Independent of order• Or: Assume drawn from probability distribution µ(relative frequency of various kinds of sentences) –eg, hear shorter sentences more often• If µ ∈ Lt, then the presentation consists of positiveexamples, o.w.,• examples in both Lt& Σ∗− Lt(negative examples),I.e., all of Σ∗(“informant presentation”)6.863J/9.611J Lecture 22 Sp03Learning algorithms & texts•A is mapping from set of all finite data streams to hypotheses in H• Finite data stream of kexamples (s1, s2 ,…, sk)• Set of all data streams of length k ,Dk= {(s1, s2 ,…, sk)|si∈ Σ∗}= (Σ*)k• Set of all finite data sequences D = ∪k>0Dk(enumerable), so:A : D → H- Can consider A to flip coins if need beIf learning by enumeration: The sequence of hypotheses after eachsentence is h1, h2, …,Hypothesis after n sentences is hn6.863J/9.611J Lecture 22 Sp03ID in the limit - dfns• Text t of language L is an infinite sequence of sentences of L with each sentence of L occurring at least once (“fair presentation”)• Text tnis the first n sentences of t• Learnability: Language L is learnable by algorithm A iffor each t of L if there exists a number m s.t. for all n>m, A (tn)= L• More formally, fix distance metric d, a target grammar gtand a text t for the target language. Learning algorithm A identifies (learns) gtin the limit ifd(A (tk), gt) → 0k→∞6.863J/9.611J Lecture 22 Sp03Convergence in the limitd(gt,hn) →n→∞0• This quantity is called generalization error• Generalization error goes to 0 as # of examples goes to infinity• In statistical setting, this error is a random variable & converges to 0 only in probabilistic sense (Valiant – PAClearning)6.863J/9.611J Lecture 22 Sp03ε−learnability & “locking sequence/data set”LBall of radius εLocking sequence:If (finite) sequence lε gets within ε of target& then it stays thereεεlε6.863J/9.611J Lecture 22 Sp03Locking sequence theorem• Thm 1 (Blum & Blum, 1975, ε version)If A identifies a target grammar g in the limit, then, for every ε>0, ∃ a locking sequence le∈ D s.t.(i) le⊆ Lg(ii) d(A (le),g)< ε & (iii) d(A (le.σ),g)< ε , ∀ σ∈D, σ ⊆ Lg• Proof by contradiction. Suppose no such le6.863J/9.611J Lecture 22 Sp03Proof…• If no such le, then ∃ some σls.t.d(A (l•σl,g) ≥ε • Use this to construct a text q on which Awill not identify the target Lg• Evil daddy: every time guesses get ε closeto the target, we’ll tack on a piece of σlthat pushes it outside that ε−ball – so,conjectures on q greater than ε infinitelyoften6.863J/9.611J Lecture 22 Sp03The adversarial parent…• Remember: d(A (l•σl,g) ≥ε • Easy to be evil: construct r= s1, s2, …, sn…for Lg• Let q1= s1. If d(A (qi,g) < ε , then pick a σqiand tack it onto the text sequence, qi+1=qiσqisi+1o.w. , d is already too large (>ε ) , so can leave qi+1sequence as qifollowed by si+1qi+1=qisi+16.863J/9.611J Lecture 22 Sp03Pinocchio sequence…LεεEvil daddy sequence6.863J/9.611J Lecture 22 Sp03Gold’s theorem• Suppose A is able to identify the family L. Then it must identify the infinite language, Linf.• By Thm, a locking sequence exists, σinf• Construct a finite language Lσinffrom this locking sequence to get locking sequence for Lσinf- a different language from Linf• A can’t identify Lσinf, a contradiction6.863J/9.611J Lecture 22 Sp03Example of identification (learning) in the limit – whether TM halts or not1 2 3 4 5 … m m+1 …NO NO NO NO NO … NO YES YES YES …Dfn of learns: ∃ some point m after which (i) algorithm A outputs correct answer; and(ii) no longer changes its answer.The following A will work:Given any Turing Machine Mj, at each time i , run the machine for i steps.If after i steps, if M has not halted, output 0 (i.e., “NO”), o.w., output 1 (i.e, “Yes”)Suppose TM halts:Suppose TM does not halt:1 2 3 4 5 …NO NO NO NO NO … NO NO NO NO …6.863J/9.611J Lecture 22 Sp03Exact learning seems too stringent• Why should we have to speak perfect French forever?• Can’t we say “MacDonald’s” once in a while?• Or what about this:• You say potato; I say pohtahto; You say potato; I say pohtahto;…6.863J/9.611J Lecture 22 Sp03Summary of learnability given Gold• With positive-only evidence, nointerestingfamilies of languages are learnable• Even if given (sentence, meaning) • Even if a stochastic grammar (mommy is talking via some distribution µ )• BUT if learner knew what the distribution was, they could learn in this case – however,this is almost like knowing the language anyway6.863J/9.611J Lecture 22 Sp03If a parent were to provide true negative evidence of the type specified by Gold, interactions would look like the Osbournes:Child: me want more.Father: ungrammatical.Child: want
View Full Document