DOC PREVIEW
CMU CS 15780 - Hidden Markov Models (HMMs)

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

15-780: Graduate ArtificialIntelligenceHidden Markov Models (HMMs)A Hidden Markov model• A set of states {s1 … sn} - In each time point we are in exactly one of these statesdenoted by qt• Πi, the probability that we start at state si• A transition probability model, P(qt = si | qt-1 = sj)• A set of possible outputs Σ - In time point t we emit a symbol σ∈Σ• An emission probability model, p(ot = σ | si)Inference in HMMs• Computing P(Q) and P(qt = si)• Computing P(Q | O) and P(qt = si |O)• Computing argmaxQP(Q)√√√Computing δt(i))()|()()()(1111111ObsqOpsqpOsqpiiiiii!"====#==)...(max)(11111tittqqtOOsqqqpit!=!=""KK#Q: Given δt(i), how can we compute δt+1(i)?A: To get from δt(i) to δt+1(i) we need to1. Add an emission for time t+1 (Ot+1)2. Transition to state si)()(max)|()|()(max)...(max)(1,111111111+++++++=====!=!=tiijtjittjtittjtittqqtObajsqOpsqsqpjOOsqqqpit"""KKThe Viterbi algorithm ! "t +1(i) = maxq1Kqtp(q1Kqt#qt +1= si#O1...Ot +1)=maxj"t( j) p(qt +1= si| qt= sj) p(Ot +1| qt +1= si)= maxj"t( j)aj,ibi(Ot +1)• Once again we use dynamic programming forsolving δt(i)• Once we have δt(i), we can solve for our P(Q*|O)By:P(Q* | O) = argmaxQP(Q|O) = P(Q* | O) = path defined by argmaxj δt(j),Learning HMMs• Until now we assumed that the emission and transitionprobabilities are known• This is usually not the case - How is “AI” pronounced by different individuals? - What is the probability of hearing “class” after “AI”?While we will discuss learning the transition andemission models, we will not discuss selecting thestates.This is often the most important task and is heavilydependent on domain knowledge.Example• Assume the model below• We also observe the following sequence: 1,2,2,5,6,5,1,2,3,3,5,3,3,2 …..• How can we determine the initial, transition and emissionprobabilities?ABInitial probabilitiesQ: assume we can observe the following sets of states: AAABBAA AABBBBB BAABBAB how can we learn the initial probabilities?A: Maximum likelihood estimation Find the initial probabilities π such thatABπA = #A/ (#A+#B))(maxarg*)|()(maxarg*1211qqqpqkTtttk!!!="==#$$$$$$k is the number ofsequences avialable fortrainingTransition probabilitiesQ: assume we can observe the set of states: AAABBAAAABBBBBAAAABBBB how can we learn the transition probabilities?A: Maximum likelihood estimation Find a transition matrix a such thataA,B = #AB / (#AB+#AA)! a* = argmaxa"k#(q1) p(qt| qt$1)t= 2T#%a* = argmaxap(qt| qt$1)t= 2T#ABremember that wedefined ai,j=p(qt=sj|qt-1=si)Emission probabilitiesQ: assume we can observe the set of states: A A A B B A A A A B B B B B A A and the set of dice values 1 2 3 5 6 3 2 1 1 3 4 5 6 5 2 3 how can we learn the emission probabilities?A: Maximum likelihood estimationABbA(5)= #A5 / (#A1+#A2 + … +#A6)Learning HMMs• In most case we do not know the set of states (fullyunsupervised)• For these cases we can use an algorithm calledexpectation maximization (EM) to learn the HMMparametersExpectation Maximization (EM)• Appropriate for problems with ‘missing values’ for thevariables.• For example, in HMMs we usually do not observe thestates - Other famous examples include clustering (variable: classmembership) and Bayesian networks (for learning parameters and /or structure from incomplete data)Expectation Maximization (EM)• Two steps• E step: Fill in the expected values for the missing variables• M step: Regular maximum likelihood estimation (MLE) using thevalues computed in the E step and the values of the other variables• Guaranteed to converge (though only to a local minima).M stepE stepvalues for (missing)variablesparametersExpectation Maximization (EM)• Two steps• E step: Fill in the expected values for the missing variables• M step: Regular maximum likelihood estimation (MLE) using thevalues computed in the E step and the values of the other variables• Guaranteed to converge (though only to a local minima).EM is another one of these vary general and highlypopular algorithms. The key computational issue ishow to derive the expectations in the E step.Forward-Backward• We already defined a forward looking variable• We also need to define a backward looking variable ! "t(i) = P(O1KOt#qt= si))|,,()(1isOOPitntt==+L!Forward-Backward• We already defined a forward looking variable• We also need to define a backward looking variable ! "t(i) = P(O1KOt#qt= si) ! "t(i) = P(Ot +1,L,On| st= i) =ai, jbj(Ot +1)"t +1( j)j#Forward-Backward• We already defined a forward looking variable• We also need to define a backward looking variable• Using these two definitions we can show ! "t(i) = P(O1KOt#qt= si))()()()()(),,|(1iSjjiiOOsqPtdefjttttnit===!"#"#LP(A|B)=P(A,B)/P(B))|,,()(1isOOPitntt==+L!State and transition probabilities• Probability of a state• We can also derive a transition probability),(),,|,(11jiSoosqsqPtnjtit===+L)()()()()(),,|(1iSjjiiOOsqPtdefjttttnit===!"#"#L ! P(qt= si,qt +1= sj| o1,L,on) =="t(i)P(qt +1= sj| qt= si)P(ot +1| qt +1= sj)#t +1( j)"t( j)#t( j)j$=defSt(i, j)E step• Compute St(i) and St(i,j) for all t, i, and j (1≤t≤n, 1≤i≤k,2≤j≤k) ! P(qt= si,qt +1= sj| o1,L,on) = St(i, j) ! P(qt= si| O1,L,On)= St(i)M step (1)Compute transition probabilities: where!=kjikinjina),(ˆ),(ˆ,!=ttjiSjin ),(),(ˆM step (2)Compute emission probabilities (here we assume amultinomial distribution): define: then!==jottktkSjB|)()(!=ikkkiBjBjb)()()(Complete EM algorithm for learning theparameters of HMMs (Baum-Welch)• Inputs: 1 .Observations O1 … On 2. Number of states, model1. Guess initial transition and emission parameters2. Compute E step: St(i) and St(i,j)3. Compute M step4. Convergence?5. Output complete modelNoWe did not discuss initial probability estimation. These canbe deduced from multiple sets of observation (for example,several recorded customers for speech processing)Advanced HMMs• Factorial HMM’s• Input-output HMMs• Dynamic Bayesian Networks (DBNs)Coupling of hidden states• A robot for tourists• Location ∈ {K by K grid}• Language ∈ {English, Spanish,French}• Talk ∈ {yes, no}How do we design a HMMfor this robot?HMM for roboguide• States: triplets {Loc, Lan, Tal}• Emissions: based on location andpresence / absence of a person• Transitions: From one triplet


View Full Document

CMU CS 15780 - Hidden Markov Models (HMMs)

Download Hidden Markov Models (HMMs)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hidden Markov Models (HMMs) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hidden Markov Models (HMMs) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?