Unformatted text preview:

Natural Language Processing Lecture 8 9 24 2013 Jim Martin Today Review Word classes Part of speech tagging HMMs Basic HMM model Decoding Viterbi 9 25 13 Speech and Language Processing Jurafsky and Martin 2 1 Word Classes Parts of Speech 8 ish traditional parts of speech Noun verb adjective preposition adverb article interjection pronoun conjunction etc Called parts of speech lexical categories word classes morphological classes lexical tags Lots of debate within linguistics about the number nature and universality of these We ll completely ignore this debate 9 25 13 Speech and Language Processing Jurafsky and Martin 3 POS Tagging The process of assigning a part of speech or lexical class marker to each word in a collection WORD tag the koala put the keys on the table 9 25 13 Speech and Language Processing Jurafsky and Martin DET N V DET N P DET N 4 2 Penn TreeBank POS Tagset Speech and Language Processing Jurafsky and Martin 9 25 13 5 POS Tagging Words often have more than one part of speech back The back door JJ On my back NN Win the voters back RB Promised to back the bill VB The POS tagging problem is to determine the POS tag for a particular instance of a word in context Usually a sentence 9 25 13 Speech and Language Processing Jurafsky and Martin 6 3 POS Tagging Note this is distinct from the task of identifying which sense of a word is being used given a particular part of speech That s called word sense disambiguation We ll get to that later backed the car into a pole backed the wrong candidate 9 25 13 Speech and Language Processing Jurafsky and Martin 7 How Hard is POS Tagging Measuring Ambiguity 9 25 13 Speech and Language Processing Jurafsky and Martin 8 4 Two Methods for POS Tagging 1 Rule based tagging ENGTWOL Section 5 4 2 Stochastic 1 Probabilistic sequence models HMM Hidden Markov Model tagging MEMMs Maximum Entropy Markov Models 9 25 13 Speech and Language Processing Jurafsky and Martin 9 POS Tagging as Sequence Classification We are given a sentence an observation or sequence of observations Secretariat is expected to race tomorrow What is the best sequence of tags that corresponds to this sequence of observations Probabilistic view Consider all possible sequences of tags Out of this universe of sequences choose the tag sequence which is most probable given the observation sequence of n words w1 wn 9 25 13 Speech and Language Processing Jurafsky and Martin 10 5 Getting to HMMs We want out of all sequences of n tags t1 tn the single tag sequence such that P t1 tn w1 wn is highest Hat means our estimate of the best one Argmaxx f x means the x such that f x is maximized 9 25 13 Speech and Language Processing Jurafsky and Martin 11 Getting to HMMs This equation should give us the best tag sequence But how to make it operational How to compute this value Intuition of Bayesian inference Use Bayes rule to transform this equation into a set of probabilities that are easier to compute and give the right answer 9 25 13 Speech and Language Processing Jurafsky and Martin 12 6 Using Bayes Rule Know this 9 25 13 Speech and Language Processing Jurafsky and Martin 13 Likelihood and Prior 9 25 13 Speech and Language Processing Jurafsky and Martin 14 7 Two Kinds of Probabilities Tag transition probabilities p ti ti 1 Determiners likely to precede adjs and nouns That DT flight NN The DT yellow JJ hat NN So we expect P NN DT and P JJ DT to be high Compute P NN DT by counting in a labeled corpus 9 25 13 Speech and Language Processing Jurafsky and Martin 15 Two Kinds of Probabilities Word likelihood probabilities p wi ti VBZ 3sg Pres Verb likely to be is Compute P is VBZ by counting in a labeled corpus 9 25 13 Speech and Language Processing Jurafsky and Martin 16 8 Example The Verb race Secretariat NNP is VBZ expected VBN to TO race VB tomorrow NR People NNS continue VB to TO inquire VB the DT reason NN for IN the DT race NN for IN outer JJ space NN How do we pick the right tag 9 25 13 Speech and Language Processing Jurafsky and Martin 17 Disambiguating race 9 25 13 Speech and Language Processing Jurafsky and Martin 18 9 Disambiguating race 9 25 13 Speech and Language Processing Jurafsky and Martin 19 Example P NN TO 00047 P VB TO 83 P race NN 00057 P race VB 00012 P NR VB 0027 P NR NN 0012 P VB TO P NR VB P race VB 00000027 P NN TO P NR NN P race NN 00000000032 So we correctly choose the verb tag for race 9 25 13 Speech and Language Processing Jurafsky and Martin 20 10 Break Quiz readings Chapters 1 to 6 Chapter 2 Chapter 3 Skip 3 4 1 3 10 3 12 Chapter 4 Skip 4 7 4 8 4 11 Chapter 5 Skip 5 5 4 5 6 5 8 5 10 Chapter 6 Skip 6 6 6 9 9 25 13 Speech and Language Processing Jurafsky and Martin 21 Hidden Markov Models What we ve just described is called a Hidden Markov Model HMM This is a kind of generative model There is a hidden underlying generator of observable events The hidden generator can be modeled as a network of states and transitions We want to infer the underlying state sequence given the observed event sequence 9 25 13 Speech and Language Processing Jurafsky and Martin 22 11 Hidden Markov Models States Q q1 q2 qN Observations O o1 o2 oN Each observation is a symbol from a vocabulary V v1 v2 vV Transition probabilities Transition probability matrix A aij aij P qt j qt 1 i 1 i j N Observation likelihoods Output probability matrix B bi k 9 25 13 bi k P X t ok qt i Special initial probability vector i P q1 i 1 i N Speech and Language Processing Jurafsky and Martin 23 HMMs for Ice Cream You are a climatologist in the year 2799 studying global warming You can t find any records of the weather in Baltimore for summer of 2007 But you find Jason Eisner s diary which lists how many ice creams Jason ate every day that summer Your job figure out how hot it was each day 9 25 13 Speech and Language Processing Jurafsky and Martin 24 12 Eisner Task Given Ice Cream Observation Sequence 1 2 3 2 2 2 3 Produce Hidden Weather Sequence H C H H H C C 9 25 13 Speech and Language Processing Jurafsky and Martin 25 HMM for Ice Cream 9 25 13 Speech and Language Processing Jurafsky and Martin 26 13 Ice Cream HMM Let s just do 131 as the sequence How many underlying state hot cold sequences are there HHH HHC HCH HCC CCC CCH CHC CHH How do you pick the right one Argmax P sequence 1 3 1 …


View Full Document

CU-Boulder CSCI 5832 - Lecture Notes

Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?