UT Dallas CS 6375 - 10.hmm-train - D3094412

Home> Schools> University of Texas at Dallas> Computer Science (CS) > CS 6375> 10.hmm-train

DOC PREVIEW

UT Dallas CS 6375 - 10.hmm-train

School name University of Texas at Dallas

Course Cs 6375- Machine Learning

Pages 26

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

11Machine LearningCS6375 --- Spring 2015aHMM TrainingInstructor: Yang LiuSlides modified from Vincent Ng, Dan Jurafsky2Hidden Markov ModelsA HMM (N, Σ, π, A, B) consists of the following elements:• N is a positive integer specifying the number of states in the model. • Σ is a set of output symbols (note: we are discussing discrete cases now).ABAAABBSSSOOOSOSO23Hidden Markov ModelsOther parameters Θ:• πjis the probability of choosing state j as an initial state.• ajkis the probability of transitioning from state j to state k.• bj(o) is the probability of emitting symbol o from state j.• In total: Θ has N + N2+ N*|Σ| parameters.Note that we have the following constraints:.1,allfor1=∑=Nkjkaj.11=∑=Njjπ.1)(,allfor =∑Σ∈ojobj4Hidden Markov ModelsAn HMM specifies a probability for each possible (x,y) pair, where x is a sequence of symbols drawn from Σ, and yis a sequence of states drawn from the integers 1, …, N. The sequences x and y are restricted to have the same length.For example, say we have an HMM with N = 2, Σ = {a, b}, and with some choice of the parameters . Take x = <a,a,b,b> and y = <1,2,2,1>. Then in this case,)()()()()|,(1221132122121bbbbababaaaayxPπ=Θ35Three Problems in Hidden Markov Models• Question 1: EvaluationWhat is the probability of the observation sequence O1O2 … OT, P(O1O2 … OT|λ)?• Question 2: Most Probable PathGiven O1O2… OT, what is the most probable path that I took?• Question 3: Learning HMMs:Given O1O2… OT, what is the maximum likelihood HMM that could have produced this string of observations?6Starting out with Observable Markov ModelsWe have state sequences.How to train?A = {aij}:aij=C(i→j)C(i → q)q∈Q∑47Training Hidden Markov ModelsSay we have an HMM with N = 2, K = { e, f, g, h }.The supervised HMM training problemGiven paired sequences {(e/1 g/2), (e/1 h/2), (f/1 h/2), (f/1 g/2)}, how to choose parameter values for ππππ, aij, and bi(o)?The unsupervised HMM training problemGiven output sequences { (e g), (e h), (f h), (f g)}, how to choose parameter values for ππππ, aij, and bi(o)?How?Maximum likelihood estimation!What’s the log likelihood function L(ΘΘΘΘ)?8MLE for HMMsWe have two sets X and Y, and a joint distribution P(x, y | ΘΘΘΘ) In HMM:each x ∈ X is an output sequence o1, …, oTeach y ∈ Y is a state sequence q1, …, qTIf we have fully observed data, (xi, yi) pairs, thenIf we have partially observed data, xiexamples, then )|,(log)( Θ=Θ∑iiiyxPL)|(log)( Θ=Θ∑iixPL∑∑∈Θ=i YyiyxP )|,(log59HMM Training: CaveatNetwork structure of HMM is always created by hand– no algorithm for double-induction of optimal structure and probabilities has been able to beat simple hand-built structures.Question is: How to choose parameter values for ππππ, aij, and bi(o)?10Supervised HMM Training: An ExampleWe have an HMM with N = 2, K = {e, f, g, h }We see the following paired sequences in training datae/1 g/2e/1 h/2f/1 h/2f/1 g/2Maximum likelihood estimates:ππππ1= 1.0, ππππ2= 0.0for parameters aij: for parameters: bi(o):j=1 j=2 j=3i=1 0 1 0i=2 0 0 1o=e o=f o=g o=hi=1 0.5 0.5 0 0i=2 0 0 0.5 0.5Intuitive? relative frequencyNote: state 3 is a dummy final state611Notation: Say (x,y) = {o1…oT, q1… qT}, andf(i,j,x,y) = number of times state j follows state i in (x,y)f(i,x,y) = number of times state i is the initial state in (x,y) f(i,o,x,y) = number of times state i is paired with observation oThen∏∏∏−∈=−=∈−∈=}1...1{}...1{},1...1{ },1...1{),,,(),,,(),,()(),(NiNjNiKoNiyxoifiyxjifijyxifiobayxPπSupervised HMM Training: The Likelihood Function12Supervised HMM Training: The Likelihood FunctionIf we have training examples (xl, yl) for l = 1 … m,∑ ∑∑= −∈=+==Θml NiillmlllyxifyxPL1 }1...1{,1,log),(()(log)(π∑∈−∈+}...1{},1...1{,log),,(NjNiijllayxjif))(log),,(},1...1{,∑∈−∈KoNiillobyxoif713Maximizing the Likelihood FunctionMaximizing this function gives MLEs:∑ ∑∑=l kllllliyxkfyxif),,(),,(π∑ ∑∑=l klllllijyxkifyxjifa),,,(),,,(∑ ∑∑∈=l Kollllliyxoifyxoifob'),,',(),,,()(Just like our intuitiveresults 14What about the Hidden State Case?For HMM, cannot compute those counts directly from the observed sequencesUse Baum-Welch algorithm. Intuitions:Iteratively estimate the counts. Start with an estimate for aijand bk, iteratively improve the estimatesSpecial case of Expectation-maximization (EM)Uses forward and backward probabilities815Recap: Forward ProbabilitiesGiven an input sequence x1… xT:Base case:Recursive case:)|,...()(1Θ==iqxxPitttαforward probabilitiesiobiiitallfor)()(1πα=TtNjobaijitiijtt...2and...1allfor)()()(1===∑−αα16Forward ProcedureComputation of αt(j) by summing over all previous values αt-1(i) for all iαt-1(i) αt(j)917The Backward ProbabilityWe define the backward probability as follows:This is the probability of generating partial observations Ot+1T(from time t+1 to the end), given that the HMM is in state i at time i and of course given model Φ.We compute it by induction:Initialization:Induction:βt(i)=P(ot+1,ot+2,...oT,|qt=i,Φ)NiiT≤≤= 1 ,1)(ββt(i) = aijbj(ot+1)j=1N∑βt+1(j), t = T−1..1, 1≤ i ≤ N18Backward ProcedureComputation of βt(i) by weighted sum of all successive values βt+11019Intuition for Re-estimation of aijWe will estimate via this intuition:Numerator intuition:Assume we had some estimate of probability that a given transition i->j was taken at time t in observation sequence.If we knew this probability for each time t, we could sum over all tto get expected value (count) for i->j.ijiaij state from ns transitioofnumber expected state to state from ns transitioofnumber expectedˆ=ijaˆ20Re-estimation of aijLet γtbe the probability of being in state i at time t and state j at time t+1, given O1..Tand model Φ:)|()|,,(),|,(),(11ΦΦ===Φ===++OPOjqiqPOjqiqPjitttttγ1121Computing Numerator for γt)( and ,, :)|,,( of componentsfour The11 ++Φ==tjijttobaOjqiqPβα22Computing Numerator for γ∑++++=ΦΦ===Φ===ittttjijttttttiijObaiOPOjqiqPOjqiqPji)()()()()()|()|,,(),|,(),(1111βαβαγP(O|Φ) can use forward only; backward only; or combination of forward and backward1223From γ to aij24Re-estimating the Observation Likelihood bjvvbkkj statein timesofnumber expected symbol observing and j statein timesofnumber expected)(ˆ=1325Computing ξComputation of ξj(t), the probability of being in state j at time t.26Reestimating the Observation

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-24-25-26 out of 26 pages.

UT Dallas CS 6375 - 10.hmm-train

Sign up for free to view:

Please select your school