DOC PREVIEW
MIT 6 863J - The Red Pill or the Blue Pill

This preview shows page 1-2-3-19-20-39-40-41 out of 41 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

6.863J Natural Language ProcessingLecture 7: The Red Pill or the Blue Pill, Episode 2: part-of-speech taggingInstructor: Robert C. [email protected]/9.611J SP04 Lecture 7The Menu Bar• Administrivia:• Schedule alert: Lab1b due today• Lab 2b released, this Weds (later today) Agenda:Red vs. Blue:• Part of speech ‘tagging’ via statistical models• Part of speech tagging via rules• Ch. 6 & 8 in Jurafsky6.863J/9.611J SP04 Lecture 7The Great Divide in NLP: the red pill or the blue pill?“KnowledgeEngineering” approachRules built by hand w/K of Language“Text understanding”“Trainable Statistical”ApproachRules inferred from lotsof data (“corpora”)“Information retrieval”6.863J/9.611J SP04 Lecture 7Two approaches1. Statistical model 2. Deterministic baseline tagger composed with a cascade of fixup transducersThese two approaches are the guts of Lab 2(lots of others methods: decision trees, …)6.863J/9.611J SP04 Lecture 7The problem• In unseen data,we wish to find the part of speech tags• The set of part of speech tags are decided by experts6.863J/9.611J SP04 Lecture 7Noishy Chunnel Muddle (statistical)noisy channel X Æ Yreal language Xyucky language Ywant to recover X from Ypart-of-speech tagsinsert wordstext6.863J/9.611J SP04 Lecture 7A picture: the statistical, noisy channel view x(speech)Wreck a nice beach?Reckon eyes peach?Recognize speech?Acoustic ModelP(x|y)LanguageModelP(y)y(text)Bigram Tag modelP(T)Word modelP(w|T)x(words) y(tags)6.863J/9.611J SP04 Lecture 7Formulation, in generalargmax Pr(|)LabelLabel Label Data=6.863J/9.611J SP04 Lecture 7General probabilistic decision problem• E.g.: data = bunch of text• label = language• label = topic• label = author• E.g.2: (sequential prediction)• label = translation or summary of entire text• label = part of speech of current word• label = identity of current word (ASR) or character (OCR)6.863J/9.611J SP04 Lecture 7Language models – statistical view• Application to speech recognition (and parsing, generally)• x= Input (speech/words)• y= output (text/Tags)• We want to find max P(y|x) Problem: we don’t know the tags – that is what we want to find!• Solution: We have an estimate of P(y) [the language model] and P(x|y) [the prob. of some sound/words given text/Tags = an acoustic model or Tag model]6.863J/9.611J SP04 Lecture 7Finding inner form given outside:From Bayes’ law, we have, max P(y|x) = max P(x|y) • P(y) = max Pr acoustic model x lang model(hold P(x) fixed, i.e., P(x|y) • P(y) / P(x), but max is same for both)So, in tagging case, we have a wordmodel instead - so we find max P(tags|w) from: max P(words|tags) • P(tags)6.863J/9.611J SP04 Lecture 7HMM for POS tagging• In a Hidden Markov model, it is hypothesized that there is an underlying finite state machine (not directly observable, hence hidden) that changes state with each input element• For us, the hidden states are the tags, and the input elements are the words6.863J/9.611J SP04 Lecture 7Hidden Markov Model tagging for POS• Prob(Tag sequence) – based on n-grams: train on marked up, tagged text• Prob(W|T) – unigram prob, based on tagged text• Prob(T, w) computed from Viterbi trellis computation - max over all possible tag sequence paths, and ‘emission’ probabilities of word, tag combination• Unseen tag sequence6.863J/9.611J SP04 Lecture 7Cartoon form ReviewTag sequence bigrams: P(T)Unigram: p(W | T)p(T, w)*==*score candidate tag seqson their joint probability with observed words;we should pick best paththe cool directed autosAdj:cortege/0.000001…Noun:Bill/0.002Noun:autos/0.001…Noun:cortege/0.000001Adj:cool/0.003Adj:directed/0.0005Det:the/0.4Det:a/0.6DetStartAdjNounVerbPrepStopNoun0.7Adj 0.3Adj 0.4ε 0.1Noun0.5Det 0.8ε 0.2*6.863J/9.611J SP04 Lecture 7HMM construction• Hidden state transition model governs observed word sequences• Transitions probabilistic• Estimate transition probabilities from an annotated corpus state ‘s’ = tag state • P(sj| sj-1, wj) • Based just on prior state and current word seen (hence Markovian assumption)• At runtime, find maximum likelihood path through the network, using a max-flow algorithm (Viterbi)6.863J/9.611J SP04 Lecture 7Cartoon form ReviewTag sequence bigrams: P(T)Unigram: p(W | T)p(T, w)*==**p(w | W)transducer: scores candidate tag seqson their joint probability with obs words;we should pick best paththe cool directed autosAdj:cortege/0.000001…Noun:Bill/0.002Noun:autos/0.001…Noun:cortege/0.000001Adj:cool/0.003Adj:directed/0.0005Det:the/0.4Det:a/0.6DetStartAdjNounVerbPrepStopNoun0.7Adj 0.3Adj 0.4ε 0.1Noun0.5Det 0.8ε 0.2*6.863J/9.611J SP04 Lecture 7P(T) - Tag bigram pictureDetBOSAdjNounEOSAdj 0.3Adj 0.4Noun0.5ε 0.2Det 0.8p(tag seq)BOS Det Adj Adj Noun EOS = 0.8 * 0.3 * 0.4 * 0.5 * 0.26.863J/9.611J SP04 Lecture 7Unigram replacement modelNoun:Bill/0.002Noun:autos/0.001…Noun:cortege/0.000001Adj:cool/0.003Adj:directed/0.0005Adj:cortege/0.000001…Det:the/0.4Det:a/0.6sums to 1sums to 1P(word| tag)6.863J/9.611J SP04 Lecture 7Compose withactual word seqDet:a 0.48Det:the 0.32DetBOSAdjNounEOSAdj:cool 0.0009Adj:directed 0.00015Adj:cortege 0.000003p(word seq, tag seq) = p(tag seq) * p(word seq | tag seq)Adj:cool 0.0012Adj:directed 0.00020Adj:cortege 0.000004N:cortegeN:autos0.00002theDet:the 0.320.32 xD:the # 0.2cool.0009 xA:coolAdj:cool 0.0009directed.0002 xA:directedAdj:directed 0.00020# 0.2x.2 ≈ .3 10-6 totalpath prob, done! #autos.00002 xN:autosN:autos6.863J/9.611J SP04 Lecture 7Unroll the fsa - All paths together form ‘trellis’Det:the 0.32DetBOS AdjNounStopp(word seq, tag seq)DetAdjNounDetAdjNounDetAdjNounAdj:directed…Noun:autos…ε0.2Adj:directed…The best path:BOS Det Adj Adj Noun EOS = 0.32 * 0.0009 …the cool directed autosAdj:cool 0.0009Noun:cool 0.007WHY?6.863J/9.611J SP04 Lecture 7Cross-product construction forms trellisSo all paths here must have 5 words on output sideAll paths here are 5 words0,01,12,13,11,22,23,21,32,33,31,42,43,44,40 1 2 3 4=*0 1234εεεεεε6.863J/9.611J SP04 Lecture 7Finding the best path from start to stop• Use dynamic programming • What is best path from Start to eachnode?• Work from left to right• Each node stores its best path from Start (as probability plus one backpointer)• Special acyclic case of Dijkstra’s shortest-path algorithm•Faster if some arcs/states are absentDet:the 0.32DetStart


View Full Document

MIT 6 863J - The Red Pill or the Blue Pill

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download The Red Pill or the Blue Pill
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Red Pill or the Blue Pill and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Red Pill or the Blue Pill 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?