DOC PREVIEW
MSU CSE 842 - LECTURE NOTES
Course Cse 842-
Pages 10

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1/24/2011 CSE842, Spring 2011, MSU 1CSE 842Natural Language ProcessingLecture 4: Part-of-Speech Tagging and HMM1/24/2011 CSE842, Spring 2011, MSU 2ReminderHomework 1 will be posted under ~cse842/Homework/hw1 later today. - Due: Feb. 7- Written part at the beginning of the class- Programming part at 11:59pm on the due date via “handin”1/24/2011 CSE842, Spring 2011, MSU 3Part of Speech Tagging• What is POS tagging?• Associate with each word a lexical tag– 45 classes from Penn Treebank– 87 classes from Brown Corpus– 146 classes from C7 tagset (CLAWS system)1/24/2011 CSE842, Spring 2011, MSU 4Why is POS Tagging Useful? • It is a first step of many practical tasks. • Speech synthesis– How to pronounce “lead”?– INsult inSULT– OBject obJECT– OVERflow overFLOW– DIScount disCOUNT– CONtent conTENT•Parsing– Need to know if a word is an N or V before you can parse• Information extraction– Finding names, relations, etc.1/24/2011 CSE842, Spring 2011, MSU 5POS TaggingWe have:• A set of tags (tagset)• A dictionary with all possible tags for each word• Input textPOS tagging is automatically assign a sequence of tags to input text. e.g., The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NNof/IN other/JJ topics/NNS ./.1/24/2011 CSE842, Spring 2011, MSU 6Tagging Method• Rule-based tagging• Transformation-based tagging• Statistical tagging based on Hidden Markov Model 1/24/2011 CSE842, Spring 2011, MSU 7Statistical Tagging• Using an HMM to do POS tagging is a special case of Bayesian inference• It is also related to the “noisy channel” model that’s the basis for automated speech recognition (ASR), optical character recognition (OCR) and machine translation (MT) 1/24/2011 CSE842, Spring 2011, MSU 8HMM Tagger • HMM Tagger: Choose the most probable sequence of tags for each sentence– Given the sentence W = w1, w2,…. wn– Compute the most probable sequence of tags T = t1, t2,.., tn, so that it maximizes:–argmaxP(t1, t2, …, tn| w1, w2, …, wn)• Apply Bayes law to rewrite it to:),...,(),...,(),...,|,...,(),...,|,...,(111111nnnnnnwwPttPttwwPwwttP =1/24/2011 CSE842, Spring 2011, MSU 9HMM Tagger (Cont.)• It becomes finding t1,…,tnthat maximizes:),...,(),...,|,...,(111 nnnttPttwwP Approximation 1: Conditional independence assumption)|(),...,|,...,(111 iniinntwPttwwP∏=≅ Approximation 2: Bigram )|(),...,(111 −=∏≅iniinttPttP Now it becomes finding t1,…,tnthat maximizes:)|(*)|(11iiiniitwPttP−=∏1/24/2011 CSE842, Spring 2011, MSU 10An ExampleExample: suppose wi= race, a verb (VB) or a noun (NN)?Assume that other mechanism has already done the best tagging to thepreceding words. 1) Secretariat/NNP is/VBZ expected/VBN to/TO race/? tomorrow2) People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/INthe/DT race/? for outer space BigramSimplify the problem:to/To race/???the/DT race/???)|()|(maxarg1 jiijjitwPttPt−=P(VB|TO) P(race | VB)P(NN|TO) P(race | NN)1/24/2011 CSE842, Spring 2011, MSU 11Where is the data? Look at the Brown and Switchboard corporaP(NN | TO) = 0.021P(VB | TO) = 0.34If we are expecting a verb, how likely it would be “race”P( race | NN) = 0.00041P( race | VB) = 0.00003Finally: P(NN | TO) P( race | NN) = 0.000007P( VB | TO) P(race | VB) = 0.000011/24/2011 CSE842, Spring 2011, MSU 12POS Tagging as Sequence Classification• We are given a sentence (an “observation” or “sequence of observations”)– Secretariat is expected to race tomorrow• What is the best sequence of tags that corresponds to this sequence of observations?• Probabilistic view:– Consider all possible sequences of tags– Out of this universe of sequences, choose the tag sequence which is most probable given the observation sequence of n words w1…wn.1/24/2011 CSE842, Spring 2011, MSU 13An ExampleGiven Word sequence w1,…., wNPOS tags t1,…, tM, ti∈[V, N, P, ART]Find:Most likely sequence of POS tags T1,…, TNfor the word sequence that maximizes: P(w1,…,wN| t1, …, tM)Example: flies like a flower, 1/24/2011 CSE842, Spring 2011, MSU 14Example: Bigram of Tags from a Corpus0.26Prob (N | P) 81 P, N 307P 0.74Prob (ART | P)226P, ART 307P 0.65Prob(ART | V) 194V, ART300V 0.35Porb( N | V) 75 V, N 300V 0.44Prob(P | N)366N, P833N 0.13Prob(N | N) 108N, N 833N 0.43Prob(V | N)358N, V833N1Prob(N | ART)558ART, N558ART0.29Prob(N|0)870, N 30000.71Prob(ART|0)2130, ART3000EstimateBigram# at i, i+1Pair# at iCat1/24/2011 CSE842, Spring 2011, MSU 15A Markov Chain ART P VN0 0.710.290.130.6510.430.350.260.440.741/24/2011 CSE842, Spring 2011, MSU 16Sparse Data In practice, we should use smoothing (to be introduced next week). In this example, for illustration purpose, we use a simplification: any bigram data that is not listed in the table will be assumed to have a transitional probability of 0.0001.What about bigrams that are not seen from the data?1/24/2011 CSE842, Spring 2011, MSU 17Word Counts1998307 558300 833Total114228456210 592others6500164bird58001642flowers68001553flower3032 3000 1the202020101a612103010like5501549fruit44002321fliesTotalP ART VNP(the | ART ) ?1/24/2011 CSE842, Spring 2011, MSU 18The Emission ProbabilityP(the | ART ) = 300/558 = 0.54P(flies | N ) = 0.025P(flies | V) = 0.076P(like | V ) = 0.1P(like | P) = 0.068P(like | N) = 0.012P(a | ART ) = 0.360P(a | N ) = 0.001P(flower | N ) = 0.063P(flower | V ) = 0.05P(bird | N) = 0.076Some examples: 1/24/2011 CSE842, Spring 2011, MSU 19Viterbi AlgorithmGiven Word sequence w1,…., wNSet of POS tags t1,…, tMEmission Probability: P(wi| tj)Transition probability: P(ti| t j)Find:Most likely sequence of POS tags T1,…, TNfor the word sequence1/24/2011 CSE842, Spring 2011, MSU 20Viterbi AlgorithmInitialization StepFor i = 1 to M do VSCORE(i, 1) = P(w1| ti) * P(ti| ∅);BACKPTR(i, 1) = 0Iteration StepFor j = 2 to Nfor i = 1 to MVSCORE (i, j) = BACKPTR(i, j) = index of k that gave the max above)|())|(*)1,(VSCORE(1 ijkiMktwPttPjkMax −=Sequence Identification StepC(N)= i that maximizes VSCORE (i, N)For j = N –1 to 1 doC( j )= BACKPTR(C(j+1), j+1)1/24/2011 CSE842, Spring 2011, MSU 21Viterbi Algorithm - Example flies/ARTflies/P flies/Vflies/NNULL/0 7.6*10-60.0072500Iteration 11/24/2011 CSE842, Spring 2011, MSU 22Viterbi Algorithm - Example flies/Vflies/NIteration 2like/ARTlike/P like/Vlike/N1/24/2011 CSE842, Spring 2011, MSU 23Viterbi Algorithm - Example flies/Vflies/N1.13*10-50.000310.000220Iteration 2like/ARTlike/P like/Vlike/N1/24/2011 CSE842,


View Full Document

MSU CSE 842 - LECTURE NOTES

Course: Cse 842-
Pages: 10
Download LECTURE NOTES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LECTURE NOTES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LECTURE NOTES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?