Hidden Markov ModelsDefinition of a hidden Markov modelThe three main questions on HMMsTodayProblem 1: DecodingDecodingDecoding – main ideaThe Viterbi AlgorithmSlide 9Viterbi Algorithm – a practical detailExampleProblem 2: EvaluationGenerating a sequence by the modelA couple of questionsEvaluationThe Forward AlgorithmThe Forward Algorithm – derivationSlide 18Relation between Forward and ViterbiMotivation for the Backward AlgorithmThe Backward Algorithm – derivationThe Backward AlgorithmComputational ComplexityPosterior DecodingSlide 25Maximum Weight TraceA modeling ExampleExample: CpG IslandsSlide 29A model of CpG Islands – (1) ArchitectureHidden Markov ModelsLecture 5, Tuesday April 15, 2003Definition of a hidden Markov modelDefinition: A hidden Markov model (HMM)•Alphabet = { b1, b2, …, bM }•Set of states Q = { 1, ..., K }•Transition probabilities between any two statesaij = transition prob from state i to state jai1 + … + aiK = 1, for all states i = 1…K•Start probabilities a0ia01 + … + a0K = 1•Emission probabilities within each stateei(b) = P( xi = b | i = k)ei(b1) + … + ei(bM) = 1, for all states i = 1…KK1…2The three main questions on HMMs1. EvaluationGIVEN a HMM M, and a sequence x,FIND Prob[ x | M ]2. DecodingGIVEN a HMM M, and a sequence x,FIND the sequence of states that maximizes P[ x, | M ]3. LearningGIVEN a HMM M, with unspecified transition/emission probs.,and a sequence x,FIND parameters = (ei(.), aij) that maximize P[ x | ]Today•Decoding •EvaluationProblem 1: DecodingFind the best parse of a sequenceDecodingGIVEN x = x1x2……xNWe want to find = 1, ……, N,such that P[ x, ] is maximized* = argmax P[ x, ]We can use dynamic programming!Let Vk(i) = max{1,…,i-1} P[x1…xi-1, 1, …, i-1, xi, i = k] = Probability of most likely sequence of states ending at state i = k12K…12K…12K…………12K…x1x2x3xK21K2Decoding – main ideaGiven that for all states k, and for a fixed position i, Vk(i) = max{1,…,i-1} P[x1…xi-1, 1, …, i-1, xi, i = k]What is Vk(i+1)?From definition, Vl(i+1) = max{1,…,i}P[ x1…xi, 1, …, i, xi+1, i+1 = l ] = max{1,…,i}P(xi+1, i+1 = l | x1…xi,1,…, i) P[x1…xi, 1,…, i] = max{1,…,i}P(xi+1, i+1 = l | i ) P[x1…xi-1, 1, …, i-1, xi, i] = maxk P(xi+1, i+1 = l | i = k) max{1,…,i-1}P[x1…xi-1,1,…,i-1, xi,i=k] = el(xi+1) maxk akl Vk(i)The Viterbi AlgorithmInput: x = x1……xNInitialization:V0(0) = 1 (0 is the imaginary first position)Vk(0) = 0, for all k > 0Iteration:Vj(i) = ej(xi) maxk akj Vk(i-1)Ptrj(i) = argmaxk akj Vk(i-1)Termination:P(x, *) = maxk Vk(N)Traceback: N* = argmaxk Vk(N) i-1* = Ptri (i)The Viterbi AlgorithmSimilar to “aligning” a set of states to a sequenceTime:O(K2N)Space:O(KN)x1 x2 x3 ………………………………………..xNState 12KVj(i)Viterbi Algorithm – a practical detailUnderflows are a significant problemP[ x1,…., xi, 1, …, i ] = a01 a12……ai e1(x1)……ei(xi)These numbers become extremely small – underflow Solution: Take the logs of all valuesVl(i) = log ek(xi) + maxk [ Vk(i-1) + log akl ]ExampleLet x be a sequence with a portion of ~ 1/6 6’s, followed by a portion of ~ ½ 6’s…x = 123456123456…12345 6626364656…1626364656Then, it is not hard to show that optimal parse is (exercise): FFF…………………...F LLL………………………...L6 nucleotides “123456” parsed as F, contribute .956(1/6)6 = 1.610-5 parsed as L, contribute .956(1/2)1(1/10)5 = 0.410-5 “162636” parsed as F, contribute .956(1/6)6 = 1.610-5 parsed as L, contribute .956(1/2)3(1/10)3 = 9.010-5Problem 2: EvaluationFind the likelihood a sequence is generated by the modelGenerating a sequence by the modelGiven a HMM, we can generate a sequence of length n as follows:1. Start at state 1 according to prob a01 2. Emit letter x1 according to prob e1(x1)3. Go to state 2 according to prob a124. … until emitting xn 12K…12K…12K…………12K…x1x2x3xn21K20e2(x1)a02A couple of questionsGiven a sequence x,•What is the probability that x was generated by the model?•Given a position i, what is the most likely state that emitted xi?Example: the dishonest casinoSay x = 12341623162616364616234161221341Most likely path: = FF……FHowever: marked letters more likely to be L than unmarked lettersEvaluationWe will develop algorithms that allow us to compute:P(x) Probability of x given the modelP(xi…xj) Probability of a substring of x given the modelP(I = k | x) Probability that the ith state is k, given xA more refined measure of which states x may be inThe Forward AlgorithmWe want to calculateP(x) = probability of x, given the HMMSum over all possible ways of generating x:P(x) = P(x, ) = P(x | ) P() To avoid summing over an exponential number of paths , define fk(i) = P(x1…xi, i = k) (the forward probability)The Forward Algorithm – derivationDefine the forward probability:fl(i) = P(x1…xi, i = l) = 1…i-1 P(x1…xi-1, 1,…, i-1, i = l) el(xi) = k 1…i-2 P(x1…xi-1, 1,…, i-2, i-1 = k) akl el(xi) = el(xi) k fk(i-1) aklThe Forward AlgorithmWe can compute fk(i) for all k, i, using dynamic programming!Initialization:f0(0) = 1fk(0) = 0, for all k > 0Iteration:fl(i) = el(xi) k fk(i-1) aklTermination:P(x) = k fk(N) ak0Where, ak0 is the probability that the terminating state is k (usually = a0k)Relation between Forward and ViterbiVITERBIInitialization:V0(0) = 1Vk(0) = 0, for all k > 0Iteration:Vj(i) = ej(xi) maxk Vk(i-1) akj Termination:P(x, *) = maxk Vk(N)FORWARDInitialization:f0(0) = 1fk(0) = 0, for all k > 0Iteration:fl(i) = el(xi) k fk(i-1) aklTermination:P(x) = k fk(N) ak0Motivation for the Backward AlgorithmWe want to computeP(i = k | x),the probability distribution on the ith position, given xWe start by computingP(i = k, x) = P(x1…xi, i = k, xi+1…xN) = P(x1…xi, i = k) P(xi+1…xN | x1…xi, i = k) = P(x1…xi, i = k) P(xi+1…xN | i = k) Forward, fk(i) Backward, bk(i)The Backward Algorithm – derivationDefine the backward probability:bk(i) = P(xi+1…xN | i = k) =
View Full Document