DOC PREVIEW
UT Dallas CS 6375 - hmm-train

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11Machine LearningCS6375 --- Spring 2015aHMM TrainingInstructor: Yang LiuSlides modified from Vincent Ng, Dan Jurafsky2Hidden Markov ModelsA HMM (N, Σ, π, A, B) consists of the following elements:• N is a positive integer specifying the number of states in the model. • Σ is a set of output symbols (note: we are discussing discrete cases now).ABAAABBSSSOOOSOSO23Hidden Markov ModelsOther parameters Θ:• πjis the probability of choosing state j as an initial state.• ajkis the probability of transitioning from state j to state k.• bj(o) is the probability of emitting symbol o from state j.• In total: Θ has N + N2+ N*|Σ| parameters.Note that we have the following constraints:.1,allfor1=∑=Nkjkaj.11=∑=Njjπ.1)(,allfor =∑Σ∈ojobj4Hidden Markov ModelsAn HMM specifies a probability for each possible (x,y) pair, where x is a sequence of symbols drawn from Σ, and yis a sequence of states drawn from the integers 1, …, N. The sequences x and y are restricted to have the same length.For example, say we have an HMM with N = 2, Σ = {a, b}, and with some choice of the parameters . Take x = <a,a,b,b> and y = <1,2,2,1>. Then in this case,)()()()()|,(1221132122121bbbbababaaaayxPπ=Θ35Three Problems in Hidden Markov Models• Question 1: EvaluationWhat is the probability of the observation sequence O1O2 … OT, P(O1O2 … OT|λ)?• Question 2: Most Probable PathGiven O1O2… OT, what is the most probable path that I took?• Question 3: Learning HMMs:Given O1O2… OT, what is the maximum likelihood HMM that could have produced this string of observations?6Starting out with Observable Markov ModelsWe have state sequences.How to train?A = {aij}:aij=C(i→j)C(i → q)q∈Q∑47Training Hidden Markov ModelsSay we have an HMM with N = 2, K = { e, f, g, h }.The supervised HMM training problemGiven paired sequences {(e/1 g/2), (e/1 h/2), (f/1 h/2), (f/1 g/2)}, how to choose parameter values for ππππ, aij, and bi(o)?The unsupervised HMM training problemGiven output sequences { (e g), (e h), (f h), (f g)}, how to choose parameter values for ππππ, aij, and bi(o)?How?Maximum likelihood estimation!What’s the log likelihood function L(ΘΘΘΘ)?8MLE for HMMsWe have two sets X and Y, and a joint distribution P(x, y | ΘΘΘΘ) In HMM:each x ∈ X is an output sequence o1, …, oTeach y ∈ Y is a state sequence q1, …, qTIf we have fully observed data, (xi, yi) pairs, thenIf we have partially observed data, xiexamples, then )|,(log)( Θ=Θ∑iiiyxPL)|(log)( Θ=Θ∑iixPL∑∑∈Θ=i YyiyxP )|,(log59HMM Training: CaveatNetwork structure of HMM is always created by hand– no algorithm for double-induction of optimal structure and probabilities has been able to beat simple hand-built structures.Question is: How to choose parameter values for ππππ, aij, and bi(o)?10Supervised HMM Training: An ExampleWe have an HMM with N = 2, K = {e, f, g, h }We see the following paired sequences in training datae/1 g/2e/1 h/2f/1 h/2f/1 g/2Maximum likelihood estimates:ππππ1= 1.0, ππππ2= 0.0for parameters aij: for parameters: bi(o):j=1 j=2 j=3i=1 0 1 0i=2 0 0 1o=e o=f o=g o=hi=1 0.5 0.5 0 0i=2 0 0 0.5 0.5Intuitive? relative frequencyNote: state 3 is a dummy final state611Notation: Say (x,y) = {o1…oT, q1… qT}, andf(i,j,x,y) = number of times state j follows state i in (x,y)f(i,x,y) = number of times state i is the initial state in (x,y) f(i,o,x,y) = number of times state i is paired with observation oThen∏∏∏−∈=−=∈−∈=}1...1{}...1{},1...1{ },1...1{),,,(),,,(),,()(),(NiNjNiKoNiyxoifiyxjifijyxifiobayxPπSupervised HMM Training: The Likelihood Function12Supervised HMM Training: The Likelihood FunctionIf we have training examples (xl, yl) for l = 1 … m,∑ ∑∑= −∈=+==Θml NiillmlllyxifyxPL1 }1...1{,1,log),(()(log)(π∑∈−∈+}...1{},1...1{,log),,(NjNiijllayxjif))(log),,(},1...1{,∑∈−∈KoNiillobyxoif713Maximizing the Likelihood FunctionMaximizing this function gives MLEs:∑ ∑∑=l kllllliyxkfyxif),,(),,(π∑ ∑∑=l klllllijyxkifyxjifa),,,(),,,(∑ ∑∑∈=l Kollllliyxoifyxoifob'),,',(),,,()(Just like our intuitiveresults 14What about the Hidden State Case?For HMM, cannot compute those counts directly from the observed sequencesUse Baum-Welch algorithm. Intuitions:Iteratively estimate the counts. Start with an estimate for aijand bk, iteratively improve the estimatesSpecial case of Expectation-maximization (EM)Uses forward and backward probabilities815Recap: Forward ProbabilitiesGiven an input sequence x1… xT:Base case:Recursive case:)|,...()(1Θ==iqxxPitttαforward probabilitiesiobiiitallfor)()(1πα=TtNjobaijitiijtt...2and...1allfor)()()(1===∑−αα16Forward ProcedureComputation of αt(j) by summing over all previous values αt-1(i) for all iαt-1(i) αt(j)917The Backward ProbabilityWe define the backward probability as follows:This is the probability of generating partial observations Ot+1T(from time t+1 to the end), given that the HMM is in state i at time i and of course given model Φ.We compute it by induction:Initialization:Induction:βt(i)=P(ot+1,ot+2,...oT,|qt=i,Φ)NiiT≤≤= 1 ,1)(ββt(i) = aijbj(ot+1)j=1N∑βt+1(j), t = T−1..1, 1≤ i ≤ N18Backward ProcedureComputation of βt(i) by weighted sum of all successive values βt+11019Intuition for Re-estimation of aijWe will estimate via this intuition:Numerator intuition:Assume we had some estimate of probability that a given transition i->j was taken at time t in observation sequence.If we knew this probability for each time t, we could sum over all tto get expected value (count) for i->j.ijiaij state from ns transitioofnumber expected state to state from ns transitioofnumber expectedˆ=ijaˆ20Re-estimation of aijLet γtbe the probability of being in state i at time t and state j at time t+1, given O1..Tand model Φ:)|()|,,(),|,(),(11ΦΦ===Φ===++OPOjqiqPOjqiqPjitttttγ1121Computing Numerator for γt)( and ,, :)|,,( of componentsfour The11 ++Φ==tjijttobaOjqiqPβα22Computing Numerator for γ∑++++=ΦΦ===Φ===ittttjijttttttiijObaiOPOjqiqPOjqiqPji)()()()()()|()|,,(),|,(),(1111βαβαγP(O|Φ) can use forward only; backward only; or combination of forward and backward1223From γ to aij24Re-estimating the Observation Likelihood bjvvbkkj statein timesofnumber expected symbol observing and j statein timesofnumber expected)(ˆ=1325Computing ξComputation of ξj(t), the probability of being in state j at time t.26Reestimating the Observation


View Full Document

UT Dallas CS 6375 - hmm-train

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download hmm-train
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view hmm-train and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view hmm-train 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?