DOC PREVIEW
MIT 6 863J - The Red Pill or the Blue Pill

This preview shows page 1-2-3-4-25-26-27-52-53-54-55 out of 55 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

6.863J Natural Language ProcessingLecture 6: The Red Pill or the Blue Pill, Episode 1: part-of-speech taggingInstructor: Robert C. [email protected]/9.611J SP04 Lecture 6The Menu Bar• Administrivia:• Schedule alert: Lab1b due today• Lab 2a, released today; Lab 2b, this Weds Agenda:Red vs. Blue:• Ngrams as models of language• Part of speech ‘tagging’ via statistical models• Ch. 6 & 8 in Jurafsky6.863J/9.611J SP04 Lecture 6The Great Divide in NLP: the red pill or the blue pill?“KnowledgeEngineering” approachRules built by hand w/K of Language“Text understanding”“Trainable Statistical”ApproachRules inferred from lotsof data (“corpora”)“Information retrieval”6.863J/9.611J SP04 Lecture 6Two ways• Probabilistic model - some constraints on morpheme sequences using prob of one character appearing before/after another prob(ing | stop) vs. prob(ly| stop)• Generative model – concatenate then fix up joints• stop + -ing = stopping, fly + s = flies• Use a cascade of transducers to handle all the fixups6.863J/9.611J SP04 Lecture 6The big picture II• In general: 2 approaches to NLP• Knowledge Engineering Approach• Grammars constructed by hand• Domain patterns discovered by human expert via introspection & inspection of ‘corpus’• Laborious tuning• Automatically Trainable Systems• Use statistical methods when possible• Learn rules from annotated (or o.w. processed) corpora6.863J/9.611J SP04 Lecture 6Preview of tagging• What is tagging?• Input: word sequence:Police police police• Output: classification (binning) of words -Noun Verb Noun or[Help!]6.863J/9.611J SP04 Lecture 6Preview of tagging & pills: red pill and blue pill methods• Method 1: statistical (n-gram)• Method 2: more symbolic (but still includes some probabilistic training + fixup) –‘example based’ learning6.863J/9.611J SP04 Lecture 6What is part of speech tagging & why?Input: the lead paint is unsafeOutput: the/Det lead/N paint/N is/V unsafe/AdjOr: BOS the lyric beauties of Schubert ‘s Trout Quintet : its elemental rhythms and infectious melodies : make it a source of pure pleasure for almost all music listeners ./6.863J/9.611J SP04 Lecture 6Tagging for this..The/DT lyric/JJ beauties/NNS of/IN Schubert/NNP 's/POS Trout/NNP Quintet/NNP --/: its/PRP$ elemental/JJ rhythms/NNS and/CC infectious/JJ melodies/NNS --/: make/VBP it/PRP a/DT source/NN of/IN pure/JJ pleasure/NN for/IN almost/RB all/DT music/NN listeners/NNS ./. 6.863J/9.611J SP04 Lecture 6Tagging words• Well defined• Easy, but not too easy (not AI-complete)• Data available for machine learning methods• Evaluation methods straightforward6.863J/9.611J SP04 Lecture 6Why should we care?• The first statistical NLP task• Been done to death by different methods• Easy to evaluate (how many tags are correct?)• Canonical finite-state task • Can be done well with methods that look at local context• Though should “really” do it by parsing!6.863J/9.611J SP04 Lecture 6Why should we care?• “Simplest” case of recovering surface, underlying form via statistical means• We are modeling p(word seq, tag seq)• The tags are hidden, but we see the words• Is tag sequence T likely given these words?6.863J/9.611J SP04 Lecture 6Tagging as n-grams• Most likely word? Most likely tag t given a word w? = P(tag|word) – not quite• Task of predicting the next word• Woody Allen:“I have a gub”But in general: predict the Nthtag from the preceding n-1 word (tags) aka N-gram6.863J/9.611J SP04 Lecture 6Summary of n-grams• n-grams define a probability model over sequences• we have seen examples of sequences of words, but one can also look at characters• n-grams deal with sparse data by using the Markov assumption6.863J/9.611J SP04 Lecture 6Markov models: the ‘pure’ statistical model…• 0th order Markov model: P(wi)• 1st order Markov model: P(wi|wi-1 )• 2nd order Markov model: P(wi|wi-1 wi-2 )…• Where do these probability estimates come from?• Counts: P(wi|wi-1) = count(wi,wi-1)/count(wi-1)(so-called maximum likelihoodestimate - MLE) 6.863J/9.611J SP04 Lecture 6N-grams• But…How many possible distinct probabilities will be needed?, i.e. parameter values• Total number of word tokens in our training data• Total number of unique words: word types is our vocabulary size6.863J/9.611J SP04 Lecture 6n-gram Parameter Sizes – large!• Let V be the vocabulary, size of V is |V|, 3000 distinct types say• P(Wi=x) how many different values for Wi ?• P(Wi= x | Wj= y), # distinct doubles =3x103 x 3x103 = 9 x 106P(Wi= x | Wk= z, Wj= y), how many distinct triples?27 x 1096.863J/9.611J SP04 Lecture 6Choosing n1.6 x 10174 (4-grams)8,000,000,000,0003 (trigrams)400,000,0002 (bigrams)Number of binsnSuppose we have a vocabulary (V) = 20,000 words6.863J/9.611J SP04 Lecture 6How far into the pastshould we go?• “long distance___”• Next word? Call?• p(wn|w…)• Consider special case above• Approximation says that | long distance call|/|distance call| ≈ |distance call|/|distanc• If context 1 word back = bigramBut even better approx if 2 words back: long distance___Not always right: long distance runner/long distance callFurther you go: collect long distance_____6.863J/9.611J SP04 Lecture 6Parameter size vs. corpus size• Corpus: said the joker to the thief|V| = 5• What’s the max # of parameters?• What’s observed? (All pairs)• We observe only |V| many bigrams!• V had better be large wrt # parameters6.863J/9.611J SP04 Lecture 6Reliability vs. discrimination“large green ___________”tree? mountain? frog? car?“swallowed the large green ________”pill? broccoli? 6.863J/9.611J SP04 Lecture 6Reliability vs. discrimination• larger n: more information about the context of the specific instance (greater discrimination)• smaller n: more instances in training data, better statistical estimates (more reliability)6.863J/9.611J SP04 Lecture 6Statistical estimatorsExample:Corpus: five Jane Austen novelsN = 617,091 wordsV = 14,585 unique wordsTask: predict the next word of the trigram “inferior to ____”from test data, Persuasion: “[In person, she was] inferior to both[sisters.]”6.863J/9.611J SP04 Lecture 6Shakespeare in lub…The unkindest cut of all• Shakespeare: 884,647 words or tokens(Kucera, 1992)• 29,066 types(incl. proper nouns)• So, # bigrams is 29,0662 > 844 million. 1 million word training set


View Full Document

MIT 6 863J - The Red Pill or the Blue Pill

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download The Red Pill or the Blue Pill
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Red Pill or the Blue Pill and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Red Pill or the Blue Pill 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?