This preview shows page 1-2-3-4-5-32-33-34-35-65-66-67-68-69 out of 69 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

6.863J Natural Language ProcessingLecture 22: Language Learning, Part 2Robert C. Berwick6.863J/9.611J Lecture 22 Sp03The Menu Bar• Administrivia:• Project-p?• Can we beat the Gold standard?• Review of the framework• Various stochastic extensions• Modern learning theory & sample size• Gold results still hold!• Learning by setting parameters: the triggering learning algorithm6.863J/9.611J Lecture 22 Sp03The problem• From finite data, induce infinite set• How is this possible, given limited time & computation?• Children are not told grammar rules• Ans: put constraints on class of possible grammars (or languages)6.863J/9.611J Lecture 22 Sp03To review: the Gold framework• Components:• Target languageLgtor Lt(with target grammar gt), drawn from hypothesis family H• Data (input) sequencesD and texts t; tn• Learning algorithm (mapping) A ; output hypothesis after input tnA (tn)• Distance metric d , hypotheses h• Definition of learnability:d(gt,hn) →n→∞06.863J/9.611J Lecture 22 Sp03Framework for learning1. Target Language Lt∈ L is a target language drawn from a class of possible target languages L.2. Example sentences si∈ Ltare drawn from the target language & presented to learner. 3. Hypothesis Languagesh ∈H drawn from a class of possible hypothesis languages that child learners construct on the basis of exposure to the example sentences in the environment4. Learning algorithm A is a computable procedure by which languages from H are selected given the examples6.863J/9.611J Lecture 22 Sp03Some details• Languages/grammars – alphabet Σ∗• Example sentences• Independent of order• Or: Assume drawn from probability distribution µ(relative frequency of various kinds of sentences) –eg, hear shorter sentences more often• If µ ∈ Lt, then the presentation consists of positiveexamples, o.w.,• examples in both Lt& Σ∗− Lt(negative examples),I.e., all of Σ∗(“informant presentation”)6.863J/9.611J Lecture 22 Sp03Learning algorithms & texts•A is mapping from set of all finite data streams to hypotheses in H• Finite data stream of kexamples (s1, s2 ,…, sk)• Set of all data streams of length k ,Dk= {(s1, s2 ,…, sk)|si∈ Σ∗}= (Σ*)k• Set of all finite data sequences D = ∪k>0Dk(enumerable), so:A : D → H- Can consider A to flip coins if need beIf learning by enumeration: The sequence of hypotheses after eachsentence is h1, h2, …,Hypothesis after n sentences is hn6.863J/9.611J Lecture 22 Sp03ID in the limit - dfns• Text t of language L is an infinite sequence of sentences of L with each sentence of L occurring at least once (“fair presentation”)• Text tnis the first n sentences of t• Learnability: Language L is learnable by algorithm A iffor each t of L if there exists a number m s.t. for all n>m, A (tn)= L• More formally, fix distance metric d, a target grammar gtand a text t for the target language. Learning algorithm A identifies (learns) gtin the limit ifd(A (tk), gt) → 0k→∞6.863J/9.611J Lecture 22 Sp03Convergence in the limitd(gt,hn) →n→∞0• This quantity is called generalization error• Generalization error goes to 0 as # of examples goes to infinity• In statistical setting, this error is a random variable & converges to 0 only in probabilistic sense (Valiant – PAClearning)6.863J/9.611J Lecture 22 Sp03ε−learnability & “locking sequence/data set”LBall of radius εLocking sequence:If (finite) sequence lε gets within ε of target& then it stays thereεεlε6.863J/9.611J Lecture 22 Sp03Locking sequence theorem• Thm 1 (Blum & Blum, 1975, ε version)If A identifies a target grammar g in the limit, then, for every ε>0, ∃ a locking sequence le∈ D s.t.(i) le⊆ Lg(ii) d(A (le),g)< ε & (iii) d(A (le.σ),g)< ε , ∀ σ∈D, σ ⊆ Lg• Proof by contradiction. Suppose no such le6.863J/9.611J Lecture 22 Sp03Proof…• If no such le, then ∃ some σls.t.d(A (l•σl,g) ≥ε • Use this to construct a text q on which Awill not identify the target Lg• Evil daddy: every time guesses get ε closeto the target, we’ll tack on a piece of σlthat pushes it outside that ε−ball – so,conjectures on q greater than ε infinitelyoften6.863J/9.611J Lecture 22 Sp03The adversarial parent…• Remember: d(A (l•σl,g) ≥ε • Easy to be evil: construct r= s1, s2, …, sn…for Lg• Let q1= s1. If d(A (qi,g) < ε , then pick a σqiand tack it onto the text sequence, qi+1=qiσqisi+1o.w. , d is already too large (>ε ) , so can leave qi+1sequence as qifollowed by si+1qi+1=qisi+16.863J/9.611J Lecture 22 Sp03Pinocchio sequence…LεεEvil daddy sequence6.863J/9.611J Lecture 22 Sp03Gold’s theorem• Suppose A is able to identify the family L. Then it must identify the infinite language, Linf.• By Thm, a locking sequence exists, σinf• Construct a finite language Lσinffrom this locking sequence to get locking sequence for Lσinf- a different language from Linf• A can’t identify Lσinf, a contradiction6.863J/9.611J Lecture 22 Sp03Example of identification (learning) in the limit – whether TM halts or not1 2 3 4 5 … m m+1 …NO NO NO NO NO … NO YES YES YES …Dfn of learns: ∃ some point m after which (i) algorithm A outputs correct answer; and(ii) no longer changes its answer.The following A will work:Given any Turing Machine Mj, at each time i , run the machine for i steps.If after i steps, if M has not halted, output 0 (i.e., “NO”), o.w., output 1 (i.e, “Yes”)Suppose TM halts:Suppose TM does not halt:1 2 3 4 5 …NO NO NO NO NO … NO NO NO NO …6.863J/9.611J Lecture 22 Sp03Exact learning seems too stringent• Why should we have to speak perfect French forever?• Can’t we say “MacDonald’s” once in a while?• Or what about this:• You say potato; I say pohtahto; You say potato; I say pohtahto;…6.863J/9.611J Lecture 22 Sp03Summary of learnability given Gold• With positive-only evidence, nointerestingfamilies of languages are learnable• Even if given (sentence, meaning) • Even if a stochastic grammar (mommy is talking via some distribution µ )• BUT if learner knew what the distribution was, they could learn in this case – however,this is almost like knowing the language anyway6.863J/9.611J Lecture 22 Sp03If a parent were to provide true negative evidence of the type specified by Gold, interactions would look like the Osbournes:Child: me want more.Father: ungrammatical.Child: want


View Full Document

MIT 6 863J - Lecture notes

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download Lecture notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?