1/30/2011 CSE842, Spring 2011, MSU 1CSE 842Natural Language ProcessingLecture 5: HMM (2)1/30/2011 CSE842, Spring 2011, MSU 2A Markov Model (Markov Chain) is:• similar to a finite-state automata, with probabilities of transitioning from one state to another:S1S5S2S3S40.50.50.30.70.10.90.80.2• transition from state to state at discrete time intervals• can only be in 1 state at any given time1.0Markov Model1/30/2011 CSE842, Spring 2011, MSU 3•Time: t = {1, 2, 3, … T}. At each time t, the occupied state emits/outputs an event that can be observed. • N states Q = {1, 2, 3, … N}. • N events E = {e1, e2, e3, …, eN}, one event corresponds to one state• initial probabilities πj= P[q1= j]1 ≤ j ≤ N• transition probabilities (first-order)aij= P[qt= j | qt-1= i]1 ≤ i, j ≤ N•Constraints: Elements of a Markov Modelaij=1; 1≤ i ≤ Nj=1N∑πj=1j=1N∑1/30/2011 CSE842, Spring 2011, MSU 4Markov Model for Dow Jones1/30/2011 CSE842, Spring 2011, MSU 5Markov Model for Dow Jones• What is the probability of 5 consecutive up days?1/30/2011 CSE842, Spring 2011, MSU 6Markov Model for Dow Jones• What is the probability of 5 consecutive up days?• Sequence is up-up-up-up-up– i.e., state sequence is 1-1-1-1-1– P(1,1,1,1,1) = – π1a11a11a11a11= 0.5 x (0.6)4= 0.06481/30/2011 CSE842, Spring 2011, MSU 7S1S20.250.40.70.5S30.20.050.70.10.1Markov Model for Winter Weather in EL*******1/30/2011 CSE842, Spring 2011, MSU 8•S1= event1= snowS2= event2= clouds A = {aij} = S3= event3= sun• what is probability of {snow, snow, snow, clouds, sun, clouds, snow}?10.70.20.10.50.40.05.25.70.π1= 0.5π2= 0.4π3= 0.1Markov Model for Winter Weather in EL1/30/2011 CSE842, Spring 2011, MSU 9•S1= event1= snowS2= event2= clouds A = {aij} = S3= event3= sun• what is probability of {snow, snow, snow, clouds, sun, clouds, snow}?Obs. = {sn, sn, sn, c, sun, c, sn}S = {S1, S1, S1, S2, S3, S2, S1}time = {1, 2, 3, 4, 5, 6, 7} (days)= P[S1] P[S1|S1] P[S1|S1] P[S2|S1] P[S3|S2] P[S2|S3] P[S1|S2]= 0.5 · 0.7 · 0.7 · 0.25 · 0.1 · 0.7 · 0.4= 0.00171510.70.20.10.50.40.05.25.70.π1= 0.5π2= 0.4π3= 0.1Markov Model for Winter Weather in EL1/30/2011 CSE842, Spring 2011, MSU 10•Conditional Independence is assumed when computing probability of sequence of events (e.g., for first-order model).•Each state associated with only one event (output).• Given list of observations, it can determine exact state sequence.⇒ state sequence not hidden.• Computing probability of a given observation isstraightforward.• Given multiple Markov Models and an observation sequence,it’s easy to determine the M.M. most likely to have generatedthe data.Summary of Markov Model1/30/2011 CSE842, Spring 2011, MSU 11Hidden Markov Model• For Markov models, the output symbols are the same as the states (i.e., a one-to-one correspondence). – See the market is up: we’re in state up• But in part-of-speech tagging (and other problems)– The output symbols are words– But the states are part-of-speech tags– not one-to-one correspondence, but rather many-to-many– We do not know what states we are in given the observation•A Hidden Markov Model is an extension of a Markov model in which the output symbols are not the same as the states.1/30/2011 CSE842, Spring 2011, MSU 12Back to POS ExampleHidden StatesNVPANVPANVPANVPAObservations• Each state can emit more than one event based on emission probabilities. Given a sequence of observations, we can’t determine exactly the state sequence; exact state sequence is “hidden.” We can compute the probability of different state sequence• Doubly stochastic (probabilities of both emitting events andtransitioning between states); flies like a flower1/30/2011 CSE842, Spring 2011, MSU 13•Time t = {1, 2, 3, … T}• N states Q = {1, 2, 3, … N}• M events/observations E = {e1, e2, e3, …, eM}• initial probabilities πj= P[q1= j]1 ≤ j ≤ N• transition probabilities aij= P[qt= j | qt-1= i]1 ≤ i, j ≤ N• Emission probabilities bj(k)=P[ot= ek| qt = j]1 ≤ k ≤ M• ConstraintsEntire Model: Φ = (A, B,π), where A = {aij}, B = {bi(k)}Elements of Hidden Markov Modelaij=1; 1≤ i ≤ Nj=1N∑bi(k) =1k=1M∑πj=1j=1N∑1/30/2011 CSE842, Spring 2011, MSU 14HMM for Dow JonesSuppose states are used to represent the market sentiment (1:positive, 2:negative, or 3:neutral)1/30/2011 CSE842, Spring 2011, MSU 15P( )=0.1P( )=0.2P( )=0.70.30.40.60.20.10.10.50.40.4HP( )=0.3P( )=0.4P( )=0.3MLP( )=0.6P( )=0.3P( )=0.1πH= 0.4πM= 0.2πL= 0.4HMM for Weather and Atmospheric Pressure*********************1/30/2011 CSE842, Spring 2011, MSU 16The Three Basic Problems for HMMs• L. R. Rabiner. 1989. A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc IEEE 77(2), 257-286. (most relevant to our discussion: P. 257-266)1/30/2011 CSE842, Spring 2011, MSU 17The Three Basic Problems for HMMs• Problem 1 (Evaluation): Given the observation sequence O=(o1o2…oT), and an HMM model Φ = (A,B,π), how do we efficiently compute P(O| Φ), the probability of the observation sequence, given the model• Problem 2 (Decoding): Given the observation sequence O=(o1o2…oT), and an HMM model Φ = (A,B,π), how do we choose a corresponding state sequence Q=(q1q2…qT) that is optimal in some sense (i.e., best explains the observations)• Problem 3 (Learning): How do we adjust the model parameters Φ = (A,B,π) to maximize P(O| Φ )?1/30/2011 CSE842, Spring 2011, MSU 18The Evaluation Problem• Given an observation sequence O and HMM Φ, compute P(O| Φ)• Why is this hard? Sum over all possible sequences of states!• We cannot do an explicit sum over all paths because it’s intractable: O(NT)∑∑ΦΦ=Φ=ΦQall QallQOPQPQOPOP ),|()|()|,()|(1/30/2011 CSE842, Spring 2011, MSU 19With an observation sequence O=(o1o2…oT), state sequenceQ=(q1q2… qT), and model Φ:Probability of O, given state sequence Q and model Φ, is:assuming conditional independence between observations. This expands:The probability of the state sequence Q can be written:∏=Φ=ΦTtttqoPQOP1),|(),|()()()(),|(2121TqqqobobobQOPTK⋅=ΦTTqqqqqqqaaaQP132211)|(−⋅⋅=Φ KπThe Evaluation Problem 1/30/2011 CSE842, Spring 2011, MSU 20The Evaluation Problem• Given an observation sequence O and HMM Φ, compute P(O|
View Full Document