CMU CS 10601 - Lecture - D845103

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10601> Lecture

DOC PREVIEW

CMU CS 10601 - Lecture

School name Carnegie Mellon University

Course Cs 10601- Introduction to Machine Learning

Pages 28

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Learning in HMMs10-601 Machine LearningA Hidden Markov model• A set of states {s1… sn}- In each time point we are in exactly one of these states denoted by qt• i, the probability that we start at state si• A transition probability model, P(qt= si| qt-1= sj)• A set of possible outputs - At time t we emit a symbol • An emission probability model, p(ot=  | si)AB0.20.20.80.80.50.5Inference in HMMs• Computing P(Q) and P(qt= si)• Computing P(Q | O) and P(qt= si |O)• Computing argmaxQP(Q)Most probable path• We are almost done …• One final question remainsHow do we find the most probable path, that is Q* such that P(Q* | O) = argmaxQP(Q|O)?• This is an important path- The words in speech processing- The set of genes in the genome- etc.Example• What is the most probable set of states leading to the sequence:1,2,2,5,6,5,1,2,3 ?AB0.20.20.80.8v P(v |A) P(v |B)1 .3 .12 .2 .13 .2 .14 .1 .25 .1 .26 .1 .3A=0.7 b=0.3Most probable path)()()|(maxarg)|(maxargOPQPQOPOQPQQ)()|(maxarg QPQOPQWe will use the following definition:In other words we are interested in the most likely path from 1 to t that:1. Ends in Si 2. Produces outputs O1… Ot)...(max)(11111tittqqtOOsqqqpitComputing t(i))()|()()()(1111111ObsqOpsqpOsqpiiiiii)...(max)(11111tittqqtOOsqqqpitQ: Given t(i), how can we compute t+1(i)?A: To get from t(i) to t+1(i) we need to1. Add an emission for time t+1 (Ot+1)2. Transition to state si)()(max)|()|()(max)...(max)(1,111111111tiijtjittjtittjtittqqtObajsqOpsqsqpjOOsqqqpitThe Viterbi algorithmt 1(i)  maxq1qtp(q1qtqt 1 siO1...Ot 1)maxjt( j)p(qt 1 si| qt sj)p(Ot 1| qt 1 si) maxjt( j)aj, ibi(Ot 1)• Once again we use dynamic programming for solving t(i)• Once we have t(i), we can solve for our P(Q*|O)By:P(Q* | O) = argmaxQP(Q|O) = P(Q* | O) = path defined by argmaxj t(j),Inference in HMMs• Computing P(Q) and P(qt= si)• Computing P(Q | O) and P(qt= si |O)• Computing argmaxQP(Q) Learning HMMs• Until now we assumed that the emission and transition probabilities are known• This is usually not the case- How is “AI” pronounced by different individuals?- What is the probability of hearing “class” after “AI”?While we will discuss learning the transition and emission models, we will not discuss selecting the states.This is often the most important task and is heavily dependent on domain knowledge.Example• Assume the model below• We also observe the following sequence:1,2,2,5,6,5,1,2,3,3,5,3,3,2 ….. • How can we determine the initial, transition and emission probabilities?ABInitial probabilitiesQ: assume we can observe the following sets of states:AAABBAAAABBBBBBAABBABhow can we learn the initial probabilities?A: Maximum likelihood estimationFind the initial probabilities  such thatABA= #A/ (#A+#B))(maxarg*)|()(maxarg*1211qqqpqkTtttkk is the number of sequences avialable for trainingTransition probabilitiesQ: assume we can observe the set of states:AAABBAAAABBBBBAAAABBBBhow can we learn the transition probabilities?A: Maximum likelihood estimationFind a transition matrix a such thataA,B= #AB / (#AB+#AA)a*  argmaxak(q1) p(qt| qt1)t 2Ta*  argmaxap(qt| qt1)t 2TABremember that we defined ai,j=p(qt=sj|qt-1=si)Emission probabilitiesQ: assume we can observe the set of states:A A A B B A A A A B B B B B A Aand the set of dice values1 2 3 5 6 3 2 1 1 3 4 5 6 5 2 3how can we learn the emission probabilities?A: Maximum likelihood estimationABbA(5)= #A5 / (#A1+#A2 + … +#A6)Learning HMMs• In most case we do not know what states geenratedeach of the outputs (fully unsupervised)• For these cases we can use expectation maximization (EM) to learn the HMM parametersExpectation Maximization (EM)• Appropriate for problems with „missing values‟ for the variables.• For example, in HMMs we usually do not observe the statesExpectation Maximization (EM)• Two steps• E step: Fill in the expected values for the missing variables• M step: Regular maximum likelihood estimation (MLE) using the values computed in the E step and the values of the other variables• Guaranteed to converge (though only to a local minima).M stepE stepvalues for (missing) variablesparametersExpectation Maximization (EM)• Two steps• E step: Fill in the expected values for the missing variables• M step: Regular maximum likelihood estimation (MLE) using the values computed in the E step and the values of the other variables• Guaranteed to converge (though only to a local minima).EM is another one of these vary general and highly popular algorithms. The key computational issue is how to derive the expectations in the E step.Forward-Backward • We already defined a forward looking variable• We also need to define a backward looking variablet(i)  P(O1Otqt si))|,,()(1isOOPitnttForward-Backward • We already defined a forward looking variable• We also need to define a backward looking variablet(i)  P(O1Otqt si)t(i)  P(Ot 1, ,On| st i) aj,ibj(Ot 1)t 1( j)jForward-Backward • We already defined a forward looking variable• We also need to define a backward looking variable• Using these two definitions we can showt(i)  P(O1Otqt si))()()()()(),,|(1iSjjiiOOsqPtdefjttttnitP(A|B)=P(A,B)/P(B))|,,()(1isOOPitnttState and transition probabilities• Probability of a state• We can also derive a transition probability ),(),,|,(11jiSoosqsqPtnjtit)()()()()(),,|(1iSjjiiOOsqPtdefjttttnitP(qt si,qt 1 sj| o1, ,on) t(i)P(qt 1 sj| qt si)P(ot 1| qt 1 sj)t 1( j)t( j)t( j)jdefSt(i, j)E step• Compute St(i) and St(i,j) for all t, i, and j (1≤t≤n, 1≤i≤k, 2≤j≤k)P(qt si,qt 1 sj| o1, ,on)  St(i, j)P(qt si| O1, ,On)St(i)M step (1)Compute transition probabilities:where kjikinjina),(ˆ),(ˆ,ttjiSjin ),(),(ˆM step (2)Compute emission probabilities (here we assume a multinomial distribution):define:then

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-26-27-28 out of 28 pages.

CMU CS 10601 - Lecture

Sign up for free to view:

Please select your school