Hidden Markov ModelsGenerating a sequence by the modelEvaluationThe Forward AlgorithmMotivation for the Backward AlgorithmThe Backward Algorithm – derivationThe Backward AlgorithmComputational ComplexityPosterior DecodingSlide 10Slide 11Viterbi, Forward, BackwardVariants of HMMsHigher-order HMMsSimilar Algorithms to 1st OrderModeling the Duration of StatesSlide 17Solution 1: Chain several statesSolution 2: Negative binomial distributionExample: genes in prokaryotesSolution 3: Duration modelingViterbi with duration modelingProteins, Pair HMMs, and AlignmentA state model for alignmentLet’s score the transitionsAlignment with affine gaps – state versionSlide 27Hidden Markov Models12K…12K…12K…………12K…x1x2x3xK21K2Generating a sequence by the modelGiven a HMM, we can generate a sequence of length n as follows:1. Start at state 1 according to prob a01 2. Emit letter x1 according to prob e1(x1)3. Go to state 2 according to prob a124. … until emitting xn 12K…12K…12K…………12K…x1x2x3xn21K20e2(x1)a02EvaluationWe will develop algorithms that allow us to compute:P(x) Probability of x given the modelP(xi…xj) Probability of a substring of x given the modelP(i = k | x) “Posterior” probability that the ith state is k, given xA more refined measure of which states x may be inThe Forward Algorithmfk(i) = P(x1…xi, i = k) (the forward probability)Initialization:f0(0) = 1fk(0) = 0, for all k > 0Iteration:fk(i) = ek(xi) l fl(i – 1) alkTermination:P(x) = k fk(N)Motivation for the Backward AlgorithmWe want to computeP(i = k | x),the probability distribution on the ith position, given xWe start by computingP(i = k, x) = P(x1…xi, i = k, xi+1…xN) = P(x1…xi, i = k) P(xi+1…xN | x1…xi, i = k) = P(x1…xi, i = k) P(xi+1…xN | i = k) Then, P(i = k | x) = P(i = k, x) / P(x)Forward, fk(i) Backward, bk(i)The Backward Algorithm – derivationDefine the backward probability:bk(i) = P(xi+1…xN | i = k) “starting from ith state = k, generate rest of x” = i+1…N P(xi+1,xi+2, …, xN, i+1, …, N | i = k) = l i+1…N P(xi+1,xi+2, …, xN, i+1 = l, i+2, …, N | i = k) = l el(xi+1) akl i+1…N P(xi+2, …, xN, i+2, …, N | i+1 = l) = l el(xi+1) akl bl(i+1)The Backward AlgorithmWe can compute bk(i) for all k, i, using dynamic programmingInitialization:bk(N) = 1, for all kIteration:bk(i) = l el(xi+1) akl bl(i+1)Termination:P(x) = l a0l el(x1) bl(1)Computational ComplexityWhat is the running time, and space required, for Forward, and Backward?Time: O(K2N)Space: O(KN)Useful implementation technique to avoid underflowsViterbi: sum of logsForward/Backward: rescaling at each few positions by multiplying by a constantPosterior DecodingWe can now calculatefk(i) bk(i)P(i = k | x) = ––––––– P(x)Then, we can askWhat is the most likely state at position i of sequence x:Define ^ by Posterior Decoding: ^i = argmaxk P(i = k | x) P(i = k | x) = P(i = k , x)/P(x) = P(x1, …, xi, i = k, xi+1, … xn) / P(x) =P(x1, …, xi, i = k) P(xi+1, … xn | i = k) / P(x) =fk(i) bk(i) / P(x)Posterior Decoding•For each state, Posterior Decoding gives us a curve of likelihood of state for each positionThat is sometimes more informative than Viterbi path *•Posterior Decoding may give an invalid sequence of states (of prob 0)Why?Posterior Decoding•P(i = k | x) = P( | x) 1(i = k) = {:[i] = k} P( | x)x1 x2 x3 …………………………………………… xNState 1lP(i=l|x)k1() = 1, if is true 0, otherwiseViterbi, Forward, BackwardVITERBIInitialization:V0(0) = 1Vk(0) = 0, for all k > 0Iteration: Vl(i) = el(xi) maxk Vk(i-1) akl Termination: P(x, *) = maxk Vk(N)FORWARDInitialization:f0(0) = 1fk(0) = 0, for all k > 0Iteration:fl(i) = el(xi) k fk(i-1) aklTermination:P(x) = k fk(N)BACKWARDInitialization:bk(N) = 1, for all kIteration:bl(i) = k el(xi+1) akl bk(i+1)Termination: P(x) = k a0k ek(x1) bk(1)Variants of HMMsHigher-order HMMs•How do we model “memory” larger than one time point?•P(i+1 = l | i = k) akl•P(i+1 = l | i = k, i -1 = j) ajkl•…•A second order HMM with K states is equivalent to a first order HMM with K2 statesstate H state TaHT(prev = H)aHT(prev = T)aTH(prev = H)aTH(prev = T)state HH state HTstate TH state TTaHHTaTTHaHTTaTHHaTHTaHTHSimilar Algorithms to 1st Order•P(i+1 = l | i = k, i -1 = j)Vlk(i) = maxj{ Vkj(i – 1) + … }Time? Space?Modeling the Duration of StatesLength distribution of region X:E[lX] = 1/(1-p)•Geometric distribution, with mean 1/(1-p)This is a significant disadvantage of HMMsSeveral solutions exist for modeling different length distributionsX Y1-p1-qp qExample: exon lengths in genesSolution 1: Chain several statesX Y1-p1-qpqXXDisadvantage: Still very inflexible lX = C + geometric with mean 1/(1-p)Solution 2: Negative binomial distributionDuration in X: m turns, whereDuring first m – 1 turns, exactly n – 1 arrows to next state are followedDuring mth turn, an arrow to next state is followedm – 1 m – 1P(lX = m) = n – 1 (1 – p)n-1+1p(m-1)-(n-1) = n – 1 (1 – p)npm-nX(n)pX(2)X(1)p1 – p 1 – p p……Y1 – pExample: genes in prokaryotes•EasyGene:Prokaryoticgene-finderLarsen TS, Krogh A•Negative binomial with n = 3Solution 3: Duration modelingUpon entering a state:1. Choose duration d, according to probability distribution2. Generate d letters according to emission probs3. Take a transition to next state according to transition probsDisadvantage: Increase in complexity of Viterbi:Time: O(D)Space: O(1) where D = maximum duration of stateFd<Dfxi…xi+d-1PfWarning, Rabiner’s tutorial claims O(D2) & O(D) increasesViterbi with duration modelingRecall original iteration:Vl(i) = maxk Vk(i – 1) akl el(xi) New iteration:Vl(i) = maxk maxd=1…Dl Vk(i – d) Pl(d) akl j=i-d+1…iel(xj)F Ltransitionsemissionsd<Dfxi…xi + d – 1emissionsd<Dlxj…xj + d – 1PfPlPrecompute cumulative valuesProteins, Pair HMMs, and AlignmentA state model for alignment-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACCIMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIIIM(+1,+1)I(+1, 0)J(0, +1)Alignments correspond 1-to-1 with sequences of states
View Full Document