DOC PREVIEW
Stanford CS 262 - Variants of HMMs

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Variants of HMMsHigher-order HMMsModeling the Duration of StatesSol’n 1: Chain several statesSol’n 2: Negative binomial distributionExample: genes in prokaryotesSolution 3: Duration modelingConnection Between Alignment and HMMsA state model for alignmentLet’s score the transitionsHow do we find optimal alignment according to this model?Needleman Wunsch with affine gaps – state versionProbabilistic interpretation of an alignmentA Pair HMM for alignmentsA Pair HMM for unaligned sequencesTo compare ALIGNMENT vs. RANDOM hypothesisSlide 17The meaning of alignment scoresSlide 19Slide 20Substitution matricesSubstitution MatricesBLOSUM matricesVariants of HMMsHigher-order HMMs•How do we model “memory” larger than one time point?•P(i+1 = l | i = k) akl•P(i+1 = l | i = k, i -1 = j) ajkl•…•A second order HMM with K states is equivalent to a first order HMM with K2 statesstate H state TaHT(prev = H)aHT(prev = T)aTH(prev = H)aTH(prev = T)state HH state HTstate TH state TTaHHTaTTHaHTTaTHHaTHTaHTHModeling the Duration of StatesLength distribution of region X:E[lX] = 1/(1-p)•Geometric distribution, with mean 1/(1-p)This is a significant disadvantage of HMMsSeveral solutions exist for modeling different length distributionsX Y1-p1-qp qSol’n 1: Chain several statesX Y1-p1-qpqXXDisadvantage: Still very inflexible lX = C + geometric with mean 1/(1-p)Sol’n 2: Negative binomial distributionDuration in X: m turns, whereDuring first m – 1 turns, exactly n – 1 arrows to next state are followedDuring mth turn, an arrow to next state is followedm – 1 m – 1P(lX = m) = n – 1 (1 – p)n-1+1p(m-1)-(n-1) = n – 1 (1 – p)npm-nXpXXp1 – p 1 – p p……Y1 – pExample: genes in prokaryotes•EasyGene:Prokaryoticgene-finderLarsen TS, Krogh A•Negative binomial with n = 3Solution 3: Duration modelingUpon entering a state:1. Choose duration d, according to probability distribution2. Generate d letters according to emission probs3. Take a transition to next state according to transition probsDisadvantage: Increase in complexity:Time: O(D2) -- Why?Space: O(D) where D = maximum duration of stateXConnection Between Alignment and HMMsA state model for alignment-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACCIMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIIIM(+1,+1)I(+1, 0)J(0, +1)Alignments correspond 1-to-1 with sequences of states M, I, JLet’s score the transitions-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACCIMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIIIM(+1,+1)I(+1, 0)J(0, +1)Alignments correspond 1-to-1 with sequences of states M, I, Js(xi, yj)s(xi, yj) s(xi, yj)-d-d-e-e-e-eHow do we find optimal alignment according to this model?Dynamic Programming:M(i, j): Optimal alignment of x1…xi to y1…yj ending in MI(i, j): Optimal alignment of x1…xi to y1…yj ending in IJ(i, j): Optimal alignment of x1…xi to y1…yj ending in JThe score is additive, therefore we can apply DP recurrence formulasNeedleman Wunsch with affine gaps – state versionInitialization:M(0,0) = 0; M(i,0) = M(0,j) = -, for i, j > 0I(i,0) = d + ie; J(0,j) = d + jeIteration:M(i – 1, j – 1)M(i, j) = s(xi, yj) + max I(i – 1, j – 1)J(i – 1, j – 1)e + I(i – 1, j)I(i, j) = max e + J(i, j – 1) d + M(i – 1, j – 1)e + I(i – 1, j)J(i, j) = max e + J(i, j – 1)d + M(i – 1, j – 1)Termination:Optimal alignment given by max { M(m, n), I(m, n), J(m, n) }Probabilistic interpretation of an alignmentAn alignment is a hypothesis that the two sequences are related by evolutionGoal:Produce the most likely alignmentAssert the likelihood that the sequences are indeed relatedA Pair HMM for alignmentsMP(xi, yj)IP(xi)JP(yj)1 – 2 – 1 – 2 – 1 – 2 – BEGINENDMJI  1 – 2 – A Pair HMM for unaligned sequencesBEGINIP(xi)ENDBEGINJP(yj)END1 - 1 - 1 - 1 - P(x, y | R) = (1 – )m P(x1)…P(xm) (1 – )n P(y1)…P(yn) = 2(1 – )m+n i P(xi) j P(yj)Model RTo compare ALIGNMENT vs. RANDOM hypothesisEvery pair of letters contributes:•(1 – 2 – ) P(xi, yj) when matched P(xi) P(yj) when gapped•(1 – )2 P(xi) P(yj) in random modelFocus on comparison of P(xi, yj) vs. P(xi) P(yj)BEGINIP(xi)ENDBEGINJP(yj)END1 - 1 - 1 - 1 - MP(xi, yj)IP(xi)JP(yj)1 – 2 – 1 – 2 – 1 – 2 – To compare ALIGNMENT vs. RANDOM hypothesisIdea:We will divide alignment score by the random score, and take logarithmsLet P(xi, yj) (1 – 2 – ) s(xi, yj) = log ––––––––––– + log ––––––––––– P(xi) P(yj) (1 – )2 (1 – 2 – ) P(xi) d = – log –––––––––––––––––––– (1 – ) (1 – 2 – ) P(xi)  P(xi) e = – log ––––––––––– (1 – ) P(xi) Every letter b in random model contributes (1 – ) P(b)The meaning of alignment scoresBecause , , are small, and ,  are very small, P(xi, yj) (1 – 2 –  ) P(xi, yj) s(xi, yj) = log ––––––––– + log ––––––––––  log –––––––– + log(1 – 2)P(xi) P(yj) (1 – )2 P(xi) P(yj) (1 –  – ) 1 –  d = – log ––––––––––––––––––  – log  –––––– (1 – ) (1 – 2 – ) 1 – 2 e = – log –––––––  – log  (1 – )The meaning of alignment scores•The Viterbi algorithm for Pair HMMs corresponds exactly to the Needleman-Wunsch algorithm with affine gaps•However, now we need to score alignment with parameters that add up to probability distributions 1/mean length of next gap 1/mean arrival time of next gapaffine gaps decouple arrival time with length 1/mean length of aligned sequences (set to ~0) 1/mean length of unaligned sequences (set to ~0)The meaning of alignment scoresMatch/mismatch scores: P(xi, yj) s(a, b)  log ––––––––––– (let’s ignore log(1 – 2 ) for the moment – assume no gaps) P(xi) P(yj)Example:Say DNA regions between human and mouse have average conservation of 50%Then P(A,A)


View Full Document

Stanford CS 262 - Variants of HMMs

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Variants of HMMs
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Variants of HMMs and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Variants of HMMs 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?