Computational Genomics and Molecular Biology, Fall 20101HMM Lecture Notes Tuesday, November 9thDannie Durand and Rose Hoberman1 Notation1. N states (S1..SN)2. M symbols in alphabet, Σ3. parameters, λ:1. initial distribution of states π(i)2. transition probabilities aij= P(qt= Si|qt−1= Sj). Note that∑Ni=1aij= 1, ∀ j3. emission probabilities ei(a) probability state i emits a4. Sequence of symbols: O = O1, O2, ..., OT5. Sequence of states: Q = q1, q2, ..., qT2 Using HMM’s for recognition2.1 Questions to ask with an HMM1. Given a sequence O, what is the true path? Otherwise stated, we wish to assign labels to an unlabeledsequence.Example: Identify the cytosolic, transmembrane, and extracellular regions in the sequence. In thiscase, we wish to assign the labels E, M, or C to the unlabeled data.2. What is the probability that a given sequence O, was generated by the HMM?Example: Is the sequence a transmembrane protein?3. What is the probability of being in state i when Otis emitted?Example: Is a given residue localized to the membrane?4. Use the HMM to simulate sequences.Example: Generate sequences with properties similar to real transmembrane sequences.In the previous lecture, we started discussing algorithms for answering these questions.1. We assume that most likely path, Q∗, is a good estimation of the true path, where Q∗is defined byargmaxQP(Q|O) = argmaxQP(Q, O)P(O)= argmaxQP(Q, O).On Tuesday, we discussed how to find Q∗using the Viterbi algorithm. This process is called “de-coding” because we decode the sequence of symbols to determine the hidden sequence of states. InComputational Genomics and Molecular Biology, Fall 20102speech recognition, recorded speech is “decoded” into words or phonemes to determine the meaningof the utterance.At the end of this section, we will discuss an alternative approach, called posterior decoding, thatmakes use of the Forward and Backward algorithms.2. The total probability of O is the sum of the probabilities of O over all possible paths through theHMM:P(O) =∑QP(O|Q)P(Q) =∑QP(O, Q)To calculate P(O), we use the Forward algorithm, which interatively calculates the probability ofbeing in state Siafter generating the sequence up to symbol Ot. We designate this quantity:αt(i) = P(O1, O2, O3, ...Ot, qt= Si)Algorithm: ForwardInitialization:α1(i) = πiei(O1)Iteration:αt(i) =N∑j=1αt−1( j) ∗ aji∗ ei(Ot)The probability of observing the entire sequence is given by the sum over all possible final states:P(O) =N∑i=1αT(i)3. We wish to determine P(qt= Si|Ot), the probability of being in state Siwhen Otis emitted. Thisis equivalent to the emitting O1. . . Ot−1over any path, entering Si, emitting Ot, and then emittingOt+1. . . OTover any path orP(qt= Si|Ot) = P(O1, O2, O3, ...Ot, qt= Si) · P(Ot+1, Ot+2, ...OT|qt= Si)Note that the first term is just αt(i). We calculate the second term with the Backward algorithm,described in the next section. The Backward algorithm can also be used instead of the Forwardalgorithm to calculate P(O) and it is used in the Baum-Welch algorithm for estimating the parametersof the model.Computational Genomics and Molecular Biology, Fall 201032.2 Backward algorithmThe Forward algorithm calculates the probability of the sequence starting with the first symbol: α1(·), α2(·), ..., αT(·).It is also possible to calculate the probability of emmitting O given a particular model by working backwardfrom the end of the sequence. Let βt(i) be the probability of generating the last T −t + 1 observations giventhat Ot−1was emitted from state i:βt(i) = P(Ot, Ot+1, Ot+2, ...OT|qt−1= Si)Algorithm: BackwardInitialization:βT(i) =N∑j=1aij∗ ej(OT)Iteration:βt(i) =N∑j=1aij∗ ej(Ot) ∗ βt+1( j)To determine P(qt= Si|Ot), we work backwards from OTto Ot+1, to obtainP(qt= Si|Ot) = αt(i) · βt+1(i) (1)To calculate the probability of the entire sequence, we start with T and go all the way to 2, calculating βt(i).P(O) =N∑j=1πjej(O1)β2( j)Either the Forward or the Backward algorithm can be used to determine the probability of a sequence, O.Both are needed in order to learn parameters from unlabeled data using the Baum Welch procedure.2.3 Posterior DecodingLet us revisit the question of determining the path through the HMM that corresponds to true labeling ofthe data. One approach, discussed above, is to determine the most probable path through the model usingComputational Genomics and Molecular Biology, Fall 20104the Viterbi algorithm. An alternative is to determine, independently, for every symbol Otthe most probablestate using Equation 1:ˆqt= argmaxiP(qt= Si|Ot) = argmaxiαt(i) · βt+1(i)This approach is referred to as posterior decoding because it is based on the posterior probability of emit-ting Otfrom state i when the emitted sequence is known. Note that the most probable state for Ot, ˆqt, isindependent of the most probable state for any other symbol in O. In fact, the sequence of most probablestates, ˆq1, ˆq2, . . . ˆqTmay not correspond to any legitimate path through the model. Posterior decoding maygive better results in some cases, such as when suboptimal paths are almost as probable as the most
View Full Document