DOC PREVIEW
CMU BSC 03711 - Lecture Notes

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Computational Genomics and Molecular Biology, Fall 20101HMM Lecture Notes Tuesday, November 9thDannie Durand and Rose Hoberman1 Notation1. N states (S1..SN)2. M symbols in alphabet, Σ3. parameters, λ:1. initial distribution of states π(i)2. transition probabilities aij= P(qt= Si|qt−1= Sj). Note that∑Ni=1aij= 1, ∀ j3. emission probabilities ei(a) probability state i emits a4. Sequence of symbols: O = O1, O2, ..., OT5. Sequence of states: Q = q1, q2, ..., qT2 Using HMM’s for recognition2.1 Questions to ask with an HMM1. Given a sequence O, what is the true path? Otherwise stated, we wish to assign labels to an unlabeledsequence.Example: Identify the cytosolic, transmembrane, and extracellular regions in the sequence. In thiscase, we wish to assign the labels E, M, or C to the unlabeled data.2. What is the probability that a given sequence O, was generated by the HMM?Example: Is the sequence a transmembrane protein?3. What is the probability of being in state i when Otis emitted?Example: Is a given residue localized to the membrane?4. Use the HMM to simulate sequences.Example: Generate sequences with properties similar to real transmembrane sequences.In the previous lecture, we started discussing algorithms for answering these questions.1. We assume that most likely path, Q∗, is a good estimation of the true path, where Q∗is defined byargmaxQP(Q|O) = argmaxQP(Q, O)P(O)= argmaxQP(Q, O).On Tuesday, we discussed how to find Q∗using the Viterbi algorithm. This process is called “de-coding” because we decode the sequence of symbols to determine the hidden sequence of states. InComputational Genomics and Molecular Biology, Fall 20102speech recognition, recorded speech is “decoded” into words or phonemes to determine the meaningof the utterance.At the end of this section, we will discuss an alternative approach, called posterior decoding, thatmakes use of the Forward and Backward algorithms.2. The total probability of O is the sum of the probabilities of O over all possible paths through theHMM:P(O) =∑QP(O|Q)P(Q) =∑QP(O, Q)To calculate P(O), we use the Forward algorithm, which interatively calculates the probability ofbeing in state Siafter generating the sequence up to symbol Ot. We designate this quantity:αt(i) = P(O1, O2, O3, ...Ot, qt= Si)Algorithm: ForwardInitialization:α1(i) = πiei(O1)Iteration:αt(i) =N∑j=1αt−1( j) ∗ aji∗ ei(Ot)The probability of observing the entire sequence is given by the sum over all possible final states:P(O) =N∑i=1αT(i)3. We wish to determine P(qt= Si|Ot), the probability of being in state Siwhen Otis emitted. Thisis equivalent to the emitting O1. . . Ot−1over any path, entering Si, emitting Ot, and then emittingOt+1. . . OTover any path orP(qt= Si|Ot) = P(O1, O2, O3, ...Ot, qt= Si) · P(Ot+1, Ot+2, ...OT|qt= Si)Note that the first term is just αt(i). We calculate the second term with the Backward algorithm,described in the next section. The Backward algorithm can also be used instead of the Forwardalgorithm to calculate P(O) and it is used in the Baum-Welch algorithm for estimating the parametersof the model.Computational Genomics and Molecular Biology, Fall 201032.2 Backward algorithmThe Forward algorithm calculates the probability of the sequence starting with the first symbol: α1(·), α2(·), ..., αT(·).It is also possible to calculate the probability of emmitting O given a particular model by working backwardfrom the end of the sequence. Let βt(i) be the probability of generating the last T −t + 1 observations giventhat Ot−1was emitted from state i:βt(i) = P(Ot, Ot+1, Ot+2, ...OT|qt−1= Si)Algorithm: BackwardInitialization:βT(i) =N∑j=1aij∗ ej(OT)Iteration:βt(i) =N∑j=1aij∗ ej(Ot) ∗ βt+1( j)To determine P(qt= Si|Ot), we work backwards from OTto Ot+1, to obtainP(qt= Si|Ot) = αt(i) · βt+1(i) (1)To calculate the probability of the entire sequence, we start with T and go all the way to 2, calculating βt(i).P(O) =N∑j=1πjej(O1)β2( j)Either the Forward or the Backward algorithm can be used to determine the probability of a sequence, O.Both are needed in order to learn parameters from unlabeled data using the Baum Welch procedure.2.3 Posterior DecodingLet us revisit the question of determining the path through the HMM that corresponds to true labeling ofthe data. One approach, discussed above, is to determine the most probable path through the model usingComputational Genomics and Molecular Biology, Fall 20104the Viterbi algorithm. An alternative is to determine, independently, for every symbol Otthe most probablestate using Equation 1:ˆqt= argmaxiP(qt= Si|Ot) = argmaxiαt(i) · βt+1(i)This approach is referred to as posterior decoding because it is based on the posterior probability of emit-ting Otfrom state i when the emitted sequence is known. Note that the most probable state for Ot, ˆqt, isindependent of the most probable state for any other symbol in O. In fact, the sequence of most probablestates, ˆq1, ˆq2, . . . ˆqTmay not correspond to any legitimate path through the model. Posterior decoding maygive better results in some cases, such as when suboptimal paths are almost as probable as the most


View Full Document

CMU BSC 03711 - Lecture Notes

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?