MSU CSE 842 - Natural Language Processing - D879599

Home> Schools> Michigan State University> Computer Science & Engineering (CSE) > CSE 842> Natural Language Processing

DOC PREVIEW

MSU CSE 842 - Natural Language Processing

School name Michigan State University

Course Cse 842-

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1/30/2011 CSE842, Spring 2011, MSU 1CSE 842Natural Language ProcessingLecture 5: HMM (2)1/30/2011 CSE842, Spring 2011, MSU 2A Markov Model (Markov Chain) is:• similar to a finite-state automata, with probabilities of transitioning from one state to another:S1S5S2S3S40.50.50.30.70.10.90.80.2• transition from state to state at discrete time intervals• can only be in 1 state at any given time1.0Markov Model1/30/2011 CSE842, Spring 2011, MSU 3•Time: t = {1, 2, 3, … T}. At each time t, the occupied state emits/outputs an event that can be observed. • N states Q = {1, 2, 3, … N}. • N events E = {e1, e2, e3, …, eN}, one event corresponds to one state• initial probabilities πj= P[q1= j]1 ≤ j ≤ N• transition probabilities (first-order)aij= P[qt= j | qt-1= i]1 ≤ i, j ≤ N•Constraints: Elements of a Markov Modelaij=1; 1≤ i ≤ Nj=1N∑πj=1j=1N∑1/30/2011 CSE842, Spring 2011, MSU 4Markov Model for Dow Jones1/30/2011 CSE842, Spring 2011, MSU 5Markov Model for Dow Jones• What is the probability of 5 consecutive up days?1/30/2011 CSE842, Spring 2011, MSU 6Markov Model for Dow Jones• What is the probability of 5 consecutive up days?• Sequence is up-up-up-up-up– i.e., state sequence is 1-1-1-1-1– P(1,1,1,1,1) = – π1a11a11a11a11= 0.5 x (0.6)4= 0.06481/30/2011 CSE842, Spring 2011, MSU 7S1S20.250.40.70.5S30.20.050.70.10.1Markov Model for Winter Weather in EL*******1/30/2011 CSE842, Spring 2011, MSU 8•S1= event1= snowS2= event2= clouds A = {aij} = S3= event3= sun• what is probability of {snow, snow, snow, clouds, sun, clouds, snow}?10.70.20.10.50.40.05.25.70.π1= 0.5π2= 0.4π3= 0.1Markov Model for Winter Weather in EL1/30/2011 CSE842, Spring 2011, MSU 9•S1= event1= snowS2= event2= clouds A = {aij} = S3= event3= sun• what is probability of {snow, snow, snow, clouds, sun, clouds, snow}?Obs. = {sn, sn, sn, c, sun, c, sn}S = {S1, S1, S1, S2, S3, S2, S1}time = {1, 2, 3, 4, 5, 6, 7} (days)= P[S1] P[S1|S1] P[S1|S1] P[S2|S1] P[S3|S2] P[S2|S3] P[S1|S2]= 0.5 · 0.7 · 0.7 · 0.25 · 0.1 · 0.7 · 0.4= 0.00171510.70.20.10.50.40.05.25.70.π1= 0.5π2= 0.4π3= 0.1Markov Model for Winter Weather in EL1/30/2011 CSE842, Spring 2011, MSU 10•Conditional Independence is assumed when computing probability of sequence of events (e.g., for first-order model).•Each state associated with only one event (output).• Given list of observations, it can determine exact state sequence.⇒ state sequence not hidden.• Computing probability of a given observation isstraightforward.• Given multiple Markov Models and an observation sequence,it’s easy to determine the M.M. most likely to have generatedthe data.Summary of Markov Model1/30/2011 CSE842, Spring 2011, MSU 11Hidden Markov Model• For Markov models, the output symbols are the same as the states (i.e., a one-to-one correspondence). – See the market is up: we’re in state up• But in part-of-speech tagging (and other problems)– The output symbols are words– But the states are part-of-speech tags– not one-to-one correspondence, but rather many-to-many– We do not know what states we are in given the observation•A Hidden Markov Model is an extension of a Markov model in which the output symbols are not the same as the states.1/30/2011 CSE842, Spring 2011, MSU 12Back to POS ExampleHidden StatesNVPANVPANVPANVPAObservations• Each state can emit more than one event based on emission probabilities. Given a sequence of observations, we can’t determine exactly the state sequence; exact state sequence is “hidden.” We can compute the probability of different state sequence• Doubly stochastic (probabilities of both emitting events andtransitioning between states); flies like a flower1/30/2011 CSE842, Spring 2011, MSU 13•Time t = {1, 2, 3, … T}• N states Q = {1, 2, 3, … N}• M events/observations E = {e1, e2, e3, …, eM}• initial probabilities πj= P[q1= j]1 ≤ j ≤ N• transition probabilities aij= P[qt= j | qt-1= i]1 ≤ i, j ≤ N• Emission probabilities bj(k)=P[ot= ek| qt = j]1 ≤ k ≤ M• ConstraintsEntire Model: Φ = (A, B,π), where A = {aij}, B = {bi(k)}Elements of Hidden Markov Modelaij=1; 1≤ i ≤ Nj=1N∑bi(k) =1k=1M∑πj=1j=1N∑1/30/2011 CSE842, Spring 2011, MSU 14HMM for Dow JonesSuppose states are used to represent the market sentiment (1:positive, 2:negative, or 3:neutral)1/30/2011 CSE842, Spring 2011, MSU 15P( )=0.1P( )=0.2P( )=0.70.30.40.60.20.10.10.50.40.4HP( )=0.3P( )=0.4P( )=0.3MLP( )=0.6P( )=0.3P( )=0.1πH= 0.4πM= 0.2πL= 0.4HMM for Weather and Atmospheric Pressure*********************1/30/2011 CSE842, Spring 2011, MSU 16The Three Basic Problems for HMMs• L. R. Rabiner. 1989. A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc IEEE 77(2), 257-286. (most relevant to our discussion: P. 257-266)1/30/2011 CSE842, Spring 2011, MSU 17The Three Basic Problems for HMMs• Problem 1 (Evaluation): Given the observation sequence O=(o1o2…oT), and an HMM model Φ = (A,B,π), how do we efficiently compute P(O| Φ), the probability of the observation sequence, given the model• Problem 2 (Decoding): Given the observation sequence O=(o1o2…oT), and an HMM model Φ = (A,B,π), how do we choose a corresponding state sequence Q=(q1q2…qT) that is optimal in some sense (i.e., best explains the observations)• Problem 3 (Learning): How do we adjust the model parameters Φ = (A,B,π) to maximize P(O| Φ )?1/30/2011 CSE842, Spring 2011, MSU 18The Evaluation Problem• Given an observation sequence O and HMM Φ, compute P(O| Φ)• Why is this hard? Sum over all possible sequences of states!• We cannot do an explicit sum over all paths because it’s intractable: O(NT)∑∑ΦΦ=Φ=ΦQall QallQOPQPQOPOP ),|()|()|,()|(1/30/2011 CSE842, Spring 2011, MSU 19With an observation sequence O=(o1o2…oT), state sequenceQ=(q1q2… qT), and model Φ:Probability of O, given state sequence Q and model Φ, is:assuming conditional independence between observations. This expands:The probability of the state sequence Q can be written:∏=Φ=ΦTtttqoPQOP1),|(),|()()()(),|(2121TqqqobobobQOPTK⋅=ΦTTqqqqqqqaaaQP132211)|(−⋅⋅=Φ KπThe Evaluation Problem 1/30/2011 CSE842, Spring 2011, MSU 20The Evaluation Problem• Given an observation sequence O and HMM Φ, compute P(O|

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

MSU CSE 842 - Natural Language Processing

Sign up for free to view:

Please select your school