DOC PREVIEW
CMU CS 10601 - Lecture

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Learning in HMMs10-601 Machine LearningA Hidden Markov model• A set of states {s1… sn}- In each time point we are in exactly one of these states denoted by qt• i, the probability that we start at state si• A transition probability model, P(qt= si| qt-1= sj)• A set of possible outputs - At time t we emit a symbol • An emission probability model, p(ot=  | si)AB0.20.20.80.80.50.5Inference in HMMs• Computing P(Q) and P(qt= si)• Computing P(Q | O) and P(qt= si |O)• Computing argmaxQP(Q)Most probable path• We are almost done …• One final question remainsHow do we find the most probable path, that is Q* such that P(Q* | O) = argmaxQP(Q|O)?• This is an important path- The words in speech processing- The set of genes in the genome- etc.Example• What is the most probable set of states leading to the sequence:1,2,2,5,6,5,1,2,3 ?AB0.20.20.80.8v P(v |A) P(v |B)1 .3 .12 .2 .13 .2 .14 .1 .25 .1 .26 .1 .3A=0.7 b=0.3Most probable path)()()|(maxarg)|(maxargOPQPQOPOQPQQ)()|(maxarg QPQOPQWe will use the following definition:In other words we are interested in the most likely path from 1 to t that:1. Ends in Si 2. Produces outputs O1… Ot)...(max)(11111tittqqtOOsqqqpitComputing t(i))()|()()()(1111111ObsqOpsqpOsqpiiiiii)...(max)(11111tittqqtOOsqqqpitQ: Given t(i), how can we compute t+1(i)?A: To get from t(i) to t+1(i) we need to1. Add an emission for time t+1 (Ot+1)2. Transition to state si)()(max)|()|()(max)...(max)(1,111111111tiijtjittjtittjtittqqtObajsqOpsqsqpjOOsqqqpitThe Viterbi algorithmt 1(i)  maxq1qtp(q1qtqt 1 siO1...Ot 1)maxjt( j)p(qt 1 si| qt sj)p(Ot 1| qt 1 si) maxjt( j)aj, ibi(Ot 1)• Once again we use dynamic programming for solving t(i)• Once we have t(i), we can solve for our P(Q*|O)By:P(Q* | O) = argmaxQP(Q|O) = P(Q* | O) = path defined by argmaxj t(j),Inference in HMMs• Computing P(Q) and P(qt= si)• Computing P(Q | O) and P(qt= si |O)• Computing argmaxQP(Q) Learning HMMs• Until now we assumed that the emission and transition probabilities are known• This is usually not the case- How is “AI” pronounced by different individuals?- What is the probability of hearing “class” after “AI”?While we will discuss learning the transition and emission models, we will not discuss selecting the states.This is often the most important task and is heavily dependent on domain knowledge.Example• Assume the model below• We also observe the following sequence:1,2,2,5,6,5,1,2,3,3,5,3,3,2 ….. • How can we determine the initial, transition and emission probabilities?ABInitial probabilitiesQ: assume we can observe the following sets of states:AAABBAAAABBBBBBAABBABhow can we learn the initial probabilities?A: Maximum likelihood estimationFind the initial probabilities  such thatABA= #A/ (#A+#B))(maxarg*)|()(maxarg*1211qqqpqkTtttkk is the number of sequences avialable for trainingTransition probabilitiesQ: assume we can observe the set of states:AAABBAAAABBBBBAAAABBBBhow can we learn the transition probabilities?A: Maximum likelihood estimationFind a transition matrix a such thataA,B= #AB / (#AB+#AA)a*  argmaxak(q1) p(qt| qt1)t 2Ta*  argmaxap(qt| qt1)t 2TABremember that we defined ai,j=p(qt=sj|qt-1=si)Emission probabilitiesQ: assume we can observe the set of states:A A A B B A A A A B B B B B A Aand the set of dice values1 2 3 5 6 3 2 1 1 3 4 5 6 5 2 3how can we learn the emission probabilities?A: Maximum likelihood estimationABbA(5)= #A5 / (#A1+#A2 + … +#A6)Learning HMMs• In most case we do not know what states geenratedeach of the outputs (fully unsupervised)• For these cases we can use expectation maximization (EM) to learn the HMM parametersExpectation Maximization (EM)• Appropriate for problems with „missing values‟ for the variables.• For example, in HMMs we usually do not observe the statesExpectation Maximization (EM)• Two steps• E step: Fill in the expected values for the missing variables• M step: Regular maximum likelihood estimation (MLE) using the values computed in the E step and the values of the other variables• Guaranteed to converge (though only to a local minima).M stepE stepvalues for (missing) variablesparametersExpectation Maximization (EM)• Two steps• E step: Fill in the expected values for the missing variables• M step: Regular maximum likelihood estimation (MLE) using the values computed in the E step and the values of the other variables• Guaranteed to converge (though only to a local minima).EM is another one of these vary general and highly popular algorithms. The key computational issue is how to derive the expectations in the E step.Forward-Backward • We already defined a forward looking variable• We also need to define a backward looking variablet(i)  P(O1Otqt si))|,,()(1isOOPitnttForward-Backward • We already defined a forward looking variable• We also need to define a backward looking variablet(i)  P(O1Otqt si)t(i)  P(Ot 1, ,On| st i) aj,ibj(Ot 1)t 1( j)jForward-Backward • We already defined a forward looking variable• We also need to define a backward looking variable• Using these two definitions we can showt(i)  P(O1Otqt si))()()()()(),,|(1iSjjiiOOsqPtdefjttttnitP(A|B)=P(A,B)/P(B))|,,()(1isOOPitnttState and transition probabilities• Probability of a state• We can also derive a transition probability ),(),,|,(11jiSoosqsqPtnjtit)()()()()(),,|(1iSjjiiOOsqPtdefjttttnitP(qt si,qt 1 sj| o1, ,on) t(i)P(qt 1 sj| qt si)P(ot 1| qt 1 sj)t 1( j)t( j)t( j)jdefSt(i, j)E step• Compute St(i) and St(i,j) for all t, i, and j (1≤t≤n, 1≤i≤k, 2≤j≤k)P(qt si,qt 1 sj| o1, ,on)  St(i, j)P(qt si| O1, ,On)St(i)M step (1)Compute transition probabilities:where kjikinjina),(ˆ),(ˆ,ttjiSjin ),(),(ˆM step (2)Compute emission probabilities (here we assume a multinomial distribution):define:then


View Full Document

CMU CS 10601 - Lecture

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?