DOC PREVIEW
CMU CS 10601 - leecture

This preview shows page 1-2-3-19-20-39-40-41 out of 41 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Hidden Markov models (HMMs)10-601 Machine LearningWhat’s wrong with Bayesian networks• Bayesian networks are very useful for modeling joint distributions• But they have their limitations:- Cannot account for temporal / sequence models- DAG’s (no self or any other loops) This is not a valid Bayesian network!Hidden Markov models• Model a set of observation with a set of hidden states- Robot movementObservations: range sensor, visual sensorHidden states: location (on a map)- Speech processingObservations: sound signalsHidden states: parts of speech, words- BiologyObservations: DNA base pairsHidden states: GenesHidden Markov models• Model a set of observation with a set of hidden states- Robot movementObservations: range sensor, visual sensorHidden states: location (on a map)- Speech processingObservations: sound signalsHidden states: parts of speech, words- BiologyObservations: DNA base pairsHidden states: Genes1. Hidden states generate observations2. Hidden states transition to other hidden statesExamples: Speech processingExample: Biological dataATGAAGCTACTGTCTTCTATCGAACAAGCATGCGATATTTGCCGACTTAAAAAGCTCAAG TGCTCCAAAGAAAAACCGAAGTGCGCCAAGTGTCTGAAGAACAACTGGGAGTGTCGCTAC TCTCCCAAAACCAAAAGGTCTCCGCTGACTAGGGCACATCTGACAGAAGTGGAATCAAGG CTAGAAAGACTGGAACAGCTATTTCTACTGATTTTTCCTCGAGAAGACCTTGACATGATTExample: Gambling on dice outcome• Two dices, both skewed (output model).• Can either stay with the same dice or switch to the second dice (transition mode). AB0.20.20.80.8A Hidden Markov model• A set of states {s1… sn}- In each time point we are in exactly one of these states denoted by qt• i, the probability that we start at state si• A transition probability model, P(qt= si| qt-1= sj)• A set of possible outputs - At time t we emit a symbol • An emission probability model, p(ot=  | si)AB0.20.20.80.80.50.5The Markov property• A set of states {s1… sn}- In each time point we are in exactly one of these states denoted by qt• i, the probability that we start at state si• A transition probability model, P(qt= si| qt-1= sj)• A set of possible outputs - In time point t we emit a symbol ot• An emission probability model, p(ot| si)A0.20.20.80.8An important aspect of this definitions is the Markov property: qt+1is conditionally independent of qt-1(and any earlier time points) given qtMore formally P(qt+1= si| qt = sj) = P(qt+1= si| qt = sj ,qt-1 = sj)What can we ask when using a HMM?A few examples:• “What dice is currently being used?”• “What is the probability of a 6 in the next role?”• “What is the probability of 6 in any of the next 3 roles?”AB0.20.20.80.8Inference in HMMs• Computing P(Q) and P(qt= si)- If we cannot look at observations• Computing P(Q | O) and P(qt= si |O)- When we have observation and care about the last state only• Computing argmaxQP(Q | O)- When we care about the entire pathWhat dice is currently being used?• We played t rounds so far• We want to determine P(qt= A)• Lets assume for now that we cannot observe any outputs (we are blind folded)• How can we compute this?AB0.20.20.80.80.50.5P(qt= A)?• Simple answer:Lets determine P(Q) where Q is any path that ends in AQ = q1, … qt-1, AP(Q) = P(q1, … qt-1, A) = P(A | q1, … qt-1) P(q1, … qt-1) = P(A | qt-1) P(q1, … qt-1) = … = P(A | qt-1) … P(q2| q1) P(q1)AB0.20.20.80.8Markov property!Initial probabilityP(qt= A)?• Simple answer:1. Lets determine P(Q) where Q is any path that ends in AQ = q1, … qt-1, AP(Q) = P(q1, … qt-1, A) = P(A | q1, … qt-1) P(q1, … qt-1) = P(A | qt-1) P(q1, … qt-1) = … = P(A | qt-1) … P(q2| q1) P(q1)2. P(qt= A) = P(Q)where the sum is over all sets of t states that end in AP(qt= A)?• Simple answer:1. Lets determine P(Q) where Q is any path that ends in AQ = q1, … qt-1, AP(Q) = P(q1, … qt-1, A) = P(A | q1, … qt-1) P(q1, … qt-1) = P(A | qt-1) P(q1, … qt-1) = … = P(A | qt-1) … P(q2| q1) P(q1)2. P(qt= A) = P(Q)where the sum is over all sets of t sates that end in AQ: How many sets Q are there?A: A lot! (2t-1)Not a feasible solutionP(qt= A), the smart way• Lets define pt(i) as the probability of being in state i at time t: pt(i) = p(qt= si)• We can determine pt(i) by induction1. p1(i) = i2. pt(i) = ?P(qt= A), the smart way• Lets define pt(i) = probability state i at time t = p(qt= si)• We can determine pt(i) by induction1. p1(i) = i2. pt(i) = jp(qt= si| qt-1= sj)pt-1(j)P(qt= A), the smart way• Lets define pt(i) = probability state i at time t = p(qt= si)• We can determine pt(i) by induction1. p1(i) = i2. pt(i) = jp(qt= si| qt-1= sj)pt-1(j) This type of computation is called dynamic programmingComplexity: O(n2*t)Time / statet1 t2 t3s1 .3s2 .7Number of states in our HMMInference in HMMs• Computing P(Q) and P(qt= si)• Computing P(Q | O) and P(qt= si |O)• Computing argmaxQP(Q)But what if we observe outputs?• So far, we assumed that we could not observe the outputs• In reality, we almost always can.AB0.20.20.80.8v P(v |A) P(v |B)1 .3 .12 .2 .13 .2 .14 .1 .25 .1 .26 .1 .3But what if we observe outputs?• So far, we assumed that we could not observe the outputs• In reality, we almost always can.v P(v |A) P(v |B)1 .3 .12 .2 .13 .2 .14 .1 .25 .1 .26 .1 .3Does observing the sequence 5, 6, 4, 5, 6, 6Change our belief about the state?AB0.20.20.80.8P(qt= A) when outputs are observed• We want to compute P(qt= A | O1… Ot)• For ease of writing we will use the following notations (commonly used in the literature)• aj,i= P(qt= si| qt-1= sj)• bi(ot) = P(ot| si)Transition probabilityEmission probabilityP(qt= A) when outputs are observed• We want to compute P(qt= A | O1… Ot)• Lets start with a simpler question. Given a sequence of states Q, what is P(Q | O1… Ot) = P(Q | O)?- It is pretty simple to move from P(Q) to P(qt= A ) - In some cases P(Q) is the more important question- Speech processing- NLPP(Q | O)• We can use Bayes rule:)()()|()|(OPQPQOPOQP Easy, P(O | Q) = P(o1| q1) P(o2| q2) … P(ot| qt)P(Q | O)• We can use Bayes rule:)()()|()|(OPQPQOPOQP Easy, P(Q) = P(q1) P(q2| q1) … P(qt| qt-1)P(Q | O)• We can use Bayes rule:)()()|()|(OPQPQOPOQP Hard!P(O)• What is the probability of seeing a set of observations: - An important question in it own rights, for example classification using two HMMs• Define t(i) = P(o1, o2 …, ot qt = si)• t(i) is the probability that we:1.


View Full Document

CMU CS 10601 - leecture

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download leecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view leecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view leecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?