Stanford CS 262 - Hidden Markov Models - D2468073

Home> Schools> Stanford University> Computer Science (CS) > CS 262> Hidden Markov Models

DOC PREVIEW

Stanford CS 262 - Hidden Markov Models

School name Stanford University

Course Cs 262- Computational Genomics

Pages 30

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Hidden Markov Models Lecture 6 Thursday April 17 2003 Review of Last Lecture Lecture 6 Thursday April 17 2003 1 When the true underlying states are known Given x x1 xN for which the true 1 N is known Define Akl Ek b times k l transition occurs in times state k in emits b in x We can show that the maximum likelihood parameters are Akl akl i Aki Lecture 7 Tuesday April 22 Ek b ek b c Ek c 2 When not The Baum Welch Algorithm Initialization Pick the best guess for model parameters or arbitrary Iteration Forward Backward Calculate Akl Ek b Calculate new model parameters akl ek b Calculate new log likelihood P x GUARANTEED TO BE HIGHER BY EXPECTATION MAXIMIZATION Until P x does not change much Lecture 7 Tuesday April 22 Alternative Viterbi Training Initialization Same Iteration Perform Viterbi to find Calculate Akl Ek b according to pseudocounts Calculate the new parameters akl ek b Until convergence Notes Convergence is guaranteed Why Does not maximize P x In general worse performance than Baum Welch Convenient when interested in Viterbi parsing no need to implement additional procedures Forward Backward Lecture 7 Tuesday April 22 Variants of HMMs Lecture 6 Thursday April 17 2003 Higher order HMMs The Genetic Code 3 nucleotides make 1 amino acid Statistical dependencies in triplets Question Recognize protein coding segments with a HMM Lecture 7 Tuesday April 22 One way to model protein coding regions P xixi 1xi 2 xi 1xixi 1 Every state of the HMM emits 3 nucleotides AAA AAC Transition probabilities Probability of one triplet given previous triplet P i i 1 AAT Emission probabilities P xixi 1xi 2 i 1 0 P xi 1xi 2xi 3 i 1 1 0 Lecture 7 Tuesday April 22 TTT A more elegant way Every state of the HMM emits 1 nucleotide A C G T Transition probabilities Probability of one triplet given previous 3 triplets P i i 1 i 2 i 3 Emission probabilities P xi i Algorithms extend with small modifications Lecture 7 Tuesday April 22 Modeling the Duration of States 1 p Length distribution of region X E lX 1 1 p p X Y 1 q Exponential distribution with mean 1 1 p This is a significant disadvantage of HMMs Several solutions exist for modeling different length distributions Lecture 7 Tuesday April 22 q Solution 1 Chain several states p 1 p X X X Y 1 q Disadvantage Still very inflexible lX C exponential with mean 1 1 p Lecture 7 Tuesday April 22 q Solution 2 Negative binomial distribution p p 1 p X l 1 P lX n n 1 Lecture 7 Tuesday April 22 p 1 p X pl n 1 p n X Solution 3 Duration modeling Upon entering a state 1 Choose duration d according to probability distribution 2 Generate d letters according to emission probs 3 Take a transition to next state according to transition probs X Disadvantage Increase in complexity Time O D2 Space O D Where D maximum duration of state Lecture 7 Tuesday April 22 Connection Between Alignment and HMMs Lecture 6 Thursday April 17 2003 A state model for alignment Alignments correspond 1 to 1 with sequences of states M I J M 1 1 I 1 0 J 0 1 AGGCTATCACCTGACCTCCAGGCCGA TGCCC TAG CTATCAC GACCGC GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII Lecture 7 Tuesday April 22 Let s score the transitions s xi yj Alignments correspond 1 to 1 with sequences of states M I J s xi yj M 1 1 d d e I 1 0 s xi yj e J 0 1 e AGGCTATCACCTGACCTCCAGGCCGA TGCCC TAG CTATCAC GACCGC GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII Lecture 7 Tuesday April 22 e How do we find optimal alignment according to this model Dynamic Programming M i j Optimal alignment of x1 xi to y1 yj ending in M I i j Optimal alignment of x1 xi to y1 yj ending in I J i j Optimal alignment of x1 xi to y1 yj ending in J The score is additive therefore we can apply DP recurrence formulas Lecture 7 Tuesday April 22 Needleman Wunsch with affine gaps state version Initialization M 0 0 0 M i 0 M 0 j for i j 0 I i 0 d i e J 0 j d j e Iteration M i 1 j 1 M i j s xi yj max I i 1 j 1 J i 1 j 1 e I i 1 j I i j max d M i 1 j 1 e J i j 1 e I i 1 j J i j max d M i 1 j 1 e J i j 1 Termination Optimal alignment given by max M m n I m n J m n Lecture 7 Tuesday April 22 Probabilistic interpretation of an alignment An alignment is a hypothesis that the two sequences are related by evolution Goal Produce the most likely alignment Assert the likelihood that the sequences are indeed related Lecture 7 Tuesday April 22 A Pair HMM for alignments BEGIN 1 2 M P xi yj 1 2 I 1 2 P xi M I J J P yj END Lecture 7 Tuesday April 22 A Pair HMM for alignments BEGIN 1 2 M P xi yj 1 2 I 1 2 1 2 P xi I M J J P yj END Lecture 7 Tuesday April 22 A Pair HMM for not aligned sequences Model R BEGIN 1 I 1 P xi 1 END BEGIN J 1 P yj P x y R 1 m P x1 P xm 1 n P y1 P yn 2 1 m n i P xi Lecture 7 Tuesday April 22 j P yj END To compare ALIGNMENT vs RANDOM hypothesis 1 2 Every pair of letters contributes M P xi yj 1 2 1 2 P xi yj when matched 1 2 P xi P yj when gapped I P xi J P yj 1 2 P xi P yj in random model 1 Focus on comparison of BEGIN 1 P xi yj vs P xi P yj Lecture 7 Tuesday April 22 I P xi 1 J P yj END BEGIN 1 END To compare ALIGNMENT vs RANDOM hypothesis Idea We will divide alignment score by the random score and take logarithms Let P xi yj 1 2 s xi yj log log P xi P yj 1 2 1 P xi d log 1 1 2 P xi P xi e log 1 P xi Lecture 7 Tuesday April 22 Every letter b in random model contributes 1 P b The meaning of alignment scores Because are small and are very small P xi yj 1 2 P xi yj s xi yj log log log P xi P yj 1 2 1 d log 1 1 2 e log 1 Lecture 7 Tuesday April 22 P xi P yj log log The meaning of alignment scores The Viterbi algorithm for Pair HMMs corresponds exactly to the Needleman Wunsch algorithm with affine gaps However now we need to score alignment with parameters that add up to probability …

View Full Document