Hidden Markov ModelsReview of Last LectureTime WarpingSlide 4Definition of a hidden Markov modelThe three main questions on HMMsTodayProblem 1: DecodingDecodingDecoding – main ideaThe Viterbi AlgorithmSlide 12Viterbi Algorithm – a practical detailExampleProblem 2: EvaluationGenerating a sequence by the modelA couple of questionsEvaluationThe Forward AlgorithmThe Forward Algorithm – derivationSlide 21Relation between Forward and ViterbiMotivation for the Backward AlgorithmThe Backward Algorithm – derivationThe Backward AlgorithmComputational ComplexityPosterior DecodingSlide 28A modeling ExampleExample: CpG IslandsSlide 31A model of CpG Islands – (1) ArchitectureHidden Markov ModelsLecture 5, Tuesday April 15, 2003Review of Last LectureLecture 2, Thursday April 3, 2003Lecture 5, Tuesday April 15, 2003Time WarpingDefinition: (u), (u) are connected by an approximate continuous time warping (u0, v0), if: u0, v0 are strictly increasing functions on [0, T], and (u0(t)) (v0(t)) for 0 t T (t)(t)0Tu0(t)v0(t)Lecture 5, Tuesday April 15, 2003Time Warpingvu001 212MNDefine possible steps:(u, v) is the possible difference of u and vbetween steps h-1 and h (1, 0)(u, v) = (1, 1) (0, 1)Lecture 5, Tuesday April 15, 2003Definition of a hidden Markov modelDefinition: A hidden Markov model (HMM)•Alphabet = { b1, b2, …, bM }•Set of states Q = { 1, ..., K }•Transition probabilities between any two statesaij = transition prob from state i to state jai1 + … + aiK = 1, for all states i = 1…K•Start probabilities a0ia01 + … + a0K = 1•Emission probabilities within each stateei(b) = P( xi = b | i = k)ei(b1) + … + ei(bM) = 1, for all states i = 1…KK1…2Lecture 5, Tuesday April 15, 2003The three main questions on HMMs1. EvaluationGIVEN a HMM M, and a sequence x,FIND Prob[ x | M ]2. DecodingGIVEN a HMM M, and a sequence x,FIND the sequence of states that maximizes P[ x, | M ]3. LearningGIVEN a HMM M, with unspecified transition/emission probs.,and a sequence x,FIND parameters = (ei(.), aij) that maximize P[ x | ]Lecture 5, Tuesday April 15, 2003Today•Decoding •EvaluationProblem 1: DecodingFind the best parse of a sequenceLecture 5, Tuesday April 15, 2003DecodingGIVEN x = x1x2……xNWe want to find = 1, ……, N,such that P[ x, ] is maximized* = argmax P[ x, ]We can use dynamic programming!Let Vk(i) = max{1,…,i-1} P[x1…xi-1, 1, …, i-1, xi, i = k] = Probability of most likely sequence of states ending at state i = k12K…12K…12K…………12K…x1x2x3xK21K2Lecture 5, Tuesday April 15, 2003Decoding – main ideaGiven that for all states k, and for a fixed position i, Vk(i) = max{1,…,i-1} P[x1…xi-1, 1, …, i-1, xi, i = k]What is Vk(i+1)?From definition, Vl(i+1) = max{1,…,i}P[ x1…xi, 1, …, i, xi+1, i+1 = l ] = max{1,…,i}P(xi+1, i+1 = l | x1…xi,1,…, i) P[x1…xi, 1,…, i] = max{1,…,i}P(xi+1, i+1 = l | i ) P[x1…xi-1, 1, …, i-1, xi, i] = maxk P(xi+1, i+1 = l | i = k) max{1,…,i-1}P[x1…xi-1,1,…,i-1, xi,i=k] = el(xi+1) maxk akl Vk(i)Lecture 5, Tuesday April 15, 2003The Viterbi AlgorithmInput: x = x1……xNInitialization:V0(0) = 1 (0 is the imaginary first position)Vk(0) = 0, for all k > 0Iteration:Vj(i) = ej(xi) maxk akj Vk(i-1)Ptrj(i) = argmaxk akj Vk(i-1)Termination:P(x, *) = maxk Vk(N)Traceback: N* = argmaxk Vk(N) i-1* = Ptri (i)Lecture 5, Tuesday April 15, 2003The Viterbi AlgorithmSimilar to “aligning” a set of states to a sequenceTime:O(K2N)Space:O(KN)x1 x2 x3 ………………………………………..xNState 12KVj(i)Lecture 5, Tuesday April 15, 2003Viterbi Algorithm – a practical detailUnderflows are a significant problemP[ x1,…., xi, 1, …, i ] = a01 a12……ai e1(x1)……ei(xi)These numbers become extremely small – underflow Solution: Take the logs of all valuesVl(i) = log ek(xi) + maxk [ Vk(i-1) + log akl ]Lecture 5, Tuesday April 15, 2003ExampleLet x be a sequence with a portion of ~ 1/6 6’s, followed by a portion of ~ ½ 6’s…x = 123456123456…12345 6626364656…1626364656Then, it is not hard to show that optimal parse is (exercise): FFF…………………...F LLL………………………...L6 nucleotides “123456” parsed as F, contribute .956(1/6)6 = 1.610-5 parsed as L, contribute .956(1/2)1(1/10)5 = 0.410-5 “162636” parsed as F, contribute .956(1/6)6 = 1.610-5 parsed as L, contribute .956(1/2)3(1/10)3 = 9.010-5Problem 2: EvaluationFind the likelihood a sequence is generated by the modelLecture 5, Tuesday April 15, 2003Generating a sequence by the modelGiven a HMM, we can generate a sequence of length n as follows:1. Start at state 1 according to prob a01 2. Emit letter x1 according to prob e1(x1)3. Go to state 2 according to prob a124. … until emitting xn 12K…12K…12K…………12K…x1x2x3xn21K20e2(x1)a02Lecture 5, Tuesday April 15, 2003A couple of questionsGiven a sequence x,•What is the probability that x was generated by the model?•Given a position i, what is the most likely state that emitted xi?Example: the dishonest casinoSay x = 12341623162616364616234161221341Most likely path: = FF……FHowever: marked letters more likely to be L than unmarked lettersLecture 5, Tuesday April 15, 2003EvaluationWe will develop algorithms that allow us to compute:P(x) Probability of x given the modelP(xi…xj) Probability of a substring of x given the modelP(I = k | x) Probability that the ith state is k, given xA more refined measure of which states x may be inLecture 5, Tuesday April 15, 2003The Forward AlgorithmWe want to calculateP(x) = probability of x, given the HMMSum over all possible ways of generating x:P(x) = P(x, ) = P(x | ) P() To avoid summing over an exponential number of paths , define fk(i) = P(x1…xi, i = k) (the forward probability)Lecture 5, Tuesday April 15, 2003The Forward Algorithm – derivationDefine the forward probability:fl(i) = P(x1…xi, i = l) = 1…i-1 P(x1…xi-1, 1,…, i-1, i = l) el(xi) = k 1…i-2 P(x1…xi-1, 1,…, i-2, i-1 = k) akl el(xi) = el(xi) k fk(i-1) aklLecture 5, Tuesday April 15, 2003The Forward AlgorithmWe can compute fk(i)
View Full Document