DOC PREVIEW
CMU BSC 03711 - Lecture

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Computational Genomics and Molecular Biology, Fall 2009 1HMM Lecture Notes Tuesday, October 29thDannie Durand and Rose Hoberman1 Notation1. N states (S1..SN)2. M symbols in alphabet, Σ3. parameters, λ:1. initial distribution of states π(i)2. transition probabilities aij= P (qt= Si|qt−1= Sj). Note thatPNi=1aij= 1, ∀j3. emission probabilities ei(a) probability state i emits a4. Sequence of symbols: O = O1, O2, ..., OT5. Sequence of states: Q = q1, q2, ..., qT2 Using HMM’s for recogition2.1 Questions to ask with an HMM1. Given a sequence O, what is the true path? Otherwise stated, we wish to assign labels to anunlabeled sequence.Example: Identify the cytosolic, transmembrane, and extracellular regions in the sequence.In this case, we wish to assign the labels E, M, or C to the unlabeled data.2. What is the probability that a given sequence O, was generated by the HMM?Example: Is the sequence a transmembrane protein?3. What is the probability of being in state i when Otis emitted?Example: Is a given residue localized to the membrane?4. Use the HMM to simulate sequences.Example: Generate sequences with properties similar to real transmembrane sequences.In the previous lecture, we started discussing algorithms for answering these questions.1. We assume that most likely path, Q∗, is a good estimation of the true path, where Q∗isdefined byargmaxQP (Q|O) = argmaxQP (Q, O)P (O)= argmaxQP (Q, O).On Tuesday, we discussed how to find Q∗using the Viterbi algorithm. This pro cess is called“decoding” because we decode the seq uence of symbols to determine the hidden sequenceComputational Genomics and Molecular Biology, Fall 2009 2of states. In speech recognition, recorded speech is “decoded” into words or phonemes todetermine the meaning of the utterance.2. The total probability of O is the sum of the probability of O over all possible paths throughthe HMM:P (O) =XQP (O|Q)P (Q) =XQP (O, Q)To solve this we use the Forward algorithm, which interatively calculates the probabilityof being in state Siafter generating the sequence up to observation Ot. We designate thisquantity:αt(i) = P (O1, O2, O3, ...Ot, qt= Si)Algorithm: ForwardInitialization:α1(i) = πiei(O1)Iteration:αt(i) =NXj=1αt−1(j) ∗ aji∗ ei(Ot)The probability of observing the entire sequence is given by the sum over all possible finalstates:P (O) =NXi=1αT(i)3. We wish to determine P (qt= Si|Ot), the probability of being in state Siwhen Otis emitted.This is equivalent to the emitting O1. . . Ot−1over any path, entering Si, emitting Ot, andthen emitting Ot+1. . . OTover any path orP (qt= Si|Ot) = P (O1, O2, O3, ...Ot, qt= Si)P (Ot+1, Ot+2, ...OT|qt= Si)Note that the first term is just αt(i). We calculate the second term with the Backwardalgorithm, described in the next section. The Backward algorithm can also be used insteadof the Forward algorithm to calculate P (O) and it is used in the Baum-Welch algorithm forestimating the parameters of the model.Computational Genomics and Molecular Biology, Fall 2009 32.2 Backward algorithmAbove we calculated forward in time: α1(·), α2(, ·), ..., αT(, ·). It is also possible to calculate theprobability of emmitting O given a particular model by working backward from the end of thesequence. Let βt(i) be the probability of generating the last T − t + 1 observations given that Ot−1was emitted from state i:βt(i) = P (Ot, Ot+1, Ot+2, ...OT|qt−1= Si)Algorithm: BackwardInitialization:βT(i) =NXj=1aij∗ ej(OT)Iteration:βt(i) =NXj=1aij∗ ej(Ot) ∗ βt+1(j)To determine P (qt= Si|Ot), we work backwards from OTto Ot+1, to obtainP (qt= Si|Ot) = αt(i) · βt+1(i)To calculate the probability of the entire sequence, we start with T and go all the way to 2,calculating βt(i).P (O) =NXj=1πjej(O1)β2(j)Determining the most probable state for every symbol Otis an alternative to Viterbi decoding.This could give better results in some cases, such as when sub optimal paths are almost as probableas the most probable path.Either the Forward or the Backward algorithm can be used to determine the probability of asequence, O. Both are needed in order to learn parameters from unlabeled data using the BaumWelch


View Full Document

CMU BSC 03711 - Lecture

Documents in this Course
lecture

lecture

8 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?