MIT 6 863J - Lecture notes - D2506324

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 863J> Lecture notes

DOC PREVIEW

MIT 6 863J - Lecture notes

School name Massachusetts Institute of Technology

Course 6 863j- Natural Language and the Computer Representation of Knowledge

Pages 69

This preview shows page 1-2-3-4-5-32-33-34-35-65-66-67-68-69 out of 69 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 69 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

6.863J Natural Language ProcessingLecture 22: Language Learning, Part 2Robert C. Berwick6.863J/9.611J Lecture 22 Sp03The Menu Bar• Administrivia:• Project-p?• Can we beat the Gold standard?• Review of the framework• Various stochastic extensions• Modern learning theory & sample size• Gold results still hold!• Learning by setting parameters: the triggering learning algorithm6.863J/9.611J Lecture 22 Sp03The problem• From finite data, induce infinite set• How is this possible, given limited time & computation?• Children are not told grammar rules• Ans: put constraints on class of possible grammars (or languages)6.863J/9.611J Lecture 22 Sp03To review: the Gold framework• Components:• Target languageLgtor Lt(with target grammar gt), drawn from hypothesis family H• Data (input) sequencesD and texts t; tn• Learning algorithm (mapping) A ; output hypothesis after input tnA (tn)• Distance metric d , hypotheses h• Definition of learnability:d(gt,hn) →n→∞06.863J/9.611J Lecture 22 Sp03Framework for learning1. Target Language Lt∈ L is a target language drawn from a class of possible target languages L.2. Example sentences si∈ Ltare drawn from the target language & presented to learner. 3. Hypothesis Languagesh ∈H drawn from a class of possible hypothesis languages that child learners construct on the basis of exposure to the example sentences in the environment4. Learning algorithm A is a computable procedure by which languages from H are selected given the examples6.863J/9.611J Lecture 22 Sp03Some details• Languages/grammars – alphabet Σ∗• Example sentences• Independent of order• Or: Assume drawn from probability distribution µ(relative frequency of various kinds of sentences) –eg, hear shorter sentences more often• If µ ∈ Lt, then the presentation consists of positiveexamples, o.w.,• examples in both Lt& Σ∗− Lt(negative examples),I.e., all of Σ∗(“informant presentation”)6.863J/9.611J Lecture 22 Sp03Learning algorithms & texts•A is mapping from set of all finite data streams to hypotheses in H• Finite data stream of kexamples (s1, s2 ,…, sk)• Set of all data streams of length k ,Dk= {(s1, s2 ,…, sk)|si∈ Σ∗}= (Σ*)k• Set of all finite data sequences D = ∪k>0Dk(enumerable), so:A : D → H- Can consider A to flip coins if need beIf learning by enumeration: The sequence of hypotheses after eachsentence is h1, h2, …,Hypothesis after n sentences is hn6.863J/9.611J Lecture 22 Sp03ID in the limit - dfns• Text t of language L is an infinite sequence of sentences of L with each sentence of L occurring at least once (“fair presentation”)• Text tnis the first n sentences of t• Learnability: Language L is learnable by algorithm A iffor each t of L if there exists a number m s.t. for all n>m, A (tn)= L• More formally, fix distance metric d, a target grammar gtand a text t for the target language. Learning algorithm A identifies (learns) gtin the limit ifd(A (tk), gt) → 0k→∞6.863J/9.611J Lecture 22 Sp03Convergence in the limitd(gt,hn) →n→∞0• This quantity is called generalization error• Generalization error goes to 0 as # of examples goes to infinity• In statistical setting, this error is a random variable & converges to 0 only in probabilistic sense (Valiant – PAClearning)6.863J/9.611J Lecture 22 Sp03ε−learnability & “locking sequence/data set”LBall of radius εLocking sequence:If (finite) sequence lε gets within ε of target& then it stays thereεεlε6.863J/9.611J Lecture 22 Sp03Locking sequence theorem• Thm 1 (Blum & Blum, 1975, ε version)If A identifies a target grammar g in the limit, then, for every ε>0, ∃ a locking sequence le∈ D s.t.(i) le⊆ Lg(ii) d(A (le),g)< ε & (iii) d(A (le.σ),g)< ε , ∀ σ∈D, σ ⊆ Lg• Proof by contradiction. Suppose no such le6.863J/9.611J Lecture 22 Sp03Proof…• If no such le, then ∃ some σls.t.d(A (l•σl,g) ≥ε • Use this to construct a text q on which Awill not identify the target Lg• Evil daddy: every time guesses get ε closeto the target, we’ll tack on a piece of σlthat pushes it outside that ε−ball – so,conjectures on q greater than ε infinitelyoften6.863J/9.611J Lecture 22 Sp03The adversarial parent…• Remember: d(A (l•σl,g) ≥ε • Easy to be evil: construct r= s1, s2, …, sn…for Lg• Let q1= s1. If d(A (qi,g) < ε , then pick a σqiand tack it onto the text sequence, qi+1=qiσqisi+1o.w. , d is already too large (>ε ) , so can leave qi+1sequence as qifollowed by si+1qi+1=qisi+16.863J/9.611J Lecture 22 Sp03Pinocchio sequence…LεεEvil daddy sequence6.863J/9.611J Lecture 22 Sp03Gold’s theorem• Suppose A is able to identify the family L. Then it must identify the infinite language, Linf.• By Thm, a locking sequence exists, σinf• Construct a finite language Lσinffrom this locking sequence to get locking sequence for Lσinf- a different language from Linf• A can’t identify Lσinf, a contradiction6.863J/9.611J Lecture 22 Sp03Example of identification (learning) in the limit – whether TM halts or not1 2 3 4 5 … m m+1 …NO NO NO NO NO … NO YES YES YES …Dfn of learns: ∃ some point m after which (i) algorithm A outputs correct answer; and(ii) no longer changes its answer.The following A will work:Given any Turing Machine Mj, at each time i , run the machine for i steps.If after i steps, if M has not halted, output 0 (i.e., “NO”), o.w., output 1 (i.e, “Yes”)Suppose TM halts:Suppose TM does not halt:1 2 3 4 5 …NO NO NO NO NO … NO NO NO NO …6.863J/9.611J Lecture 22 Sp03Exact learning seems too stringent• Why should we have to speak perfect French forever?• Can’t we say “MacDonald’s” once in a while?• Or what about this:• You say potato; I say pohtahto; You say potato; I say pohtahto;…6.863J/9.611J Lecture 22 Sp03Summary of learnability given Gold• With positive-only evidence, nointerestingfamilies of languages are learnable• Even if given (sentence, meaning) • Even if a stochastic grammar (mommy is talking via some distribution µ )• BUT if learner knew what the distribution was, they could learn in this case – however,this is almost like knowing the language anyway6.863J/9.611J Lecture 22 Sp03If a parent were to provide true negative evidence of the type specified by Gold, interactions would look like the Osbournes:Child: me want more.Father: ungrammatical.Child: want

View Full Document