Stanford CS 262 - Variants of HMMs - D94596

Home> Schools> Stanford University> Computer Science (CS) > CS 262> Variants of HMMs

DOC PREVIEW

Stanford CS 262 - Variants of HMMs

School name Stanford University

Course Cs 262- Computational Genomics

Pages 23

This preview shows page 1-2-22-23 out of 23 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Variants of HMMsHigher-order HMMsModeling the Duration of StatesSol’n 1: Chain several statesSol’n 2: Negative binomial distributionExample: genes in prokaryotesSolution 3: Duration modelingConnection Between Alignment and HMMsA state model for alignmentLet’s score the transitionsHow do we find optimal alignment according to this model?Needleman Wunsch with affine gaps – state versionProbabilistic interpretation of an alignmentA Pair HMM for alignmentsA Pair HMM for unaligned sequencesTo compare ALIGNMENT vs. RANDOM hypothesisSlide 17The meaning of alignment scoresSlide 19Slide 20Substitution matricesSubstitution MatricesBLOSUM matricesVariants of HMMsHigher-order HMMs•How do we model “memory” larger than one time point?•P(i+1 = l | i = k) akl•P(i+1 = l | i = k, i -1 = j) ajkl•…•A second order HMM with K states is equivalent to a first order HMM with K2 statesstate H state TaHT(prev = H)aHT(prev = T)aTH(prev = H)aTH(prev = T)state HH state HTstate TH state TTaHHTaTTHaHTTaTHHaTHTaHTHModeling the Duration of StatesLength distribution of region X:E[lX] = 1/(1-p)•Geometric distribution, with mean 1/(1-p)This is a significant disadvantage of HMMsSeveral solutions exist for modeling different length distributionsX Y1-p1-qp qSol’n 1: Chain several statesX Y1-p1-qpqXXDisadvantage: Still very inflexible lX = C + geometric with mean 1/(1-p)Sol’n 2: Negative binomial distributionDuration in X: m turns, whereDuring first m – 1 turns, exactly n – 1 arrows to next state are followedDuring mth turn, an arrow to next state is followedm – 1 m – 1P(lX = m) = n – 1 (1 – p)n-1+1p(m-1)-(n-1) = n – 1 (1 – p)npm-nXpXXp1 – p 1 – p p……Y1 – pExample: genes in prokaryotes•EasyGene:Prokaryoticgene-finderLarsen TS, Krogh A•Negative binomial with n = 3Solution 3: Duration modelingUpon entering a state:1. Choose duration d, according to probability distribution2. Generate d letters according to emission probs3. Take a transition to next state according to transition probsDisadvantage: Increase in complexity:Time: O(D2) -- Why?Space: O(D) where D = maximum duration of stateXConnection Between Alignment and HMMsA state model for alignment-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACCIMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIIIM(+1,+1)I(+1, 0)J(0, +1)Alignments correspond 1-to-1 with sequences of states M, I, JLet’s score the transitions-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACCIMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIIIM(+1,+1)I(+1, 0)J(0, +1)Alignments correspond 1-to-1 with sequences of states M, I, Js(xi, yj)s(xi, yj) s(xi, yj)-d-d-e-e-e-eHow do we find optimal alignment according to this model?Dynamic Programming:M(i, j): Optimal alignment of x1…xi to y1…yj ending in MI(i, j): Optimal alignment of x1…xi to y1…yj ending in IJ(i, j): Optimal alignment of x1…xi to y1…yj ending in JThe score is additive, therefore we can apply DP recurrence formulasNeedleman Wunsch with affine gaps – state versionInitialization:M(0,0) = 0; M(i,0) = M(0,j) = -, for i, j > 0I(i,0) = d + ie; J(0,j) = d + jeIteration:M(i – 1, j – 1)M(i, j) = s(xi, yj) + max I(i – 1, j – 1)J(i – 1, j – 1)e + I(i – 1, j)I(i, j) = max e + J(i, j – 1) d + M(i – 1, j – 1)e + I(i – 1, j)J(i, j) = max e + J(i, j – 1)d + M(i – 1, j – 1)Termination:Optimal alignment given by max { M(m, n), I(m, n), J(m, n) }Probabilistic interpretation of an alignmentAn alignment is a hypothesis that the two sequences are related by evolutionGoal:Produce the most likely alignmentAssert the likelihood that the sequences are indeed relatedA Pair HMM for alignmentsMP(xi, yj)IP(xi)JP(yj)1 – 2 – 1 – 2 – 1 – 2 – BEGINENDMJI  1 – 2 – A Pair HMM for unaligned sequencesBEGINIP(xi)ENDBEGINJP(yj)END1 - 1 - 1 - 1 - P(x, y | R) = (1 – )m P(x1)…P(xm) (1 – )n P(y1)…P(yn) = 2(1 – )m+n i P(xi) j P(yj)Model RTo compare ALIGNMENT vs. RANDOM hypothesisEvery pair of letters contributes:•(1 – 2 – ) P(xi, yj) when matched P(xi) P(yj) when gapped•(1 – )2 P(xi) P(yj) in random modelFocus on comparison of P(xi, yj) vs. P(xi) P(yj)BEGINIP(xi)ENDBEGINJP(yj)END1 - 1 - 1 - 1 - MP(xi, yj)IP(xi)JP(yj)1 – 2 – 1 – 2 – 1 – 2 – To compare ALIGNMENT vs. RANDOM hypothesisIdea:We will divide alignment score by the random score, and take logarithmsLet P(xi, yj) (1 – 2 – ) s(xi, yj) = log ––––––––––– + log ––––––––––– P(xi) P(yj) (1 – )2 (1 – 2 – ) P(xi) d = – log –––––––––––––––––––– (1 – ) (1 – 2 – ) P(xi)  P(xi) e = – log ––––––––––– (1 – ) P(xi) Every letter b in random model contributes (1 – ) P(b)The meaning of alignment scoresBecause , , are small, and ,  are very small, P(xi, yj) (1 – 2 –  ) P(xi, yj) s(xi, yj) = log ––––––––– + log ––––––––––  log –––––––– + log(1 – 2)P(xi) P(yj) (1 – )2 P(xi) P(yj) (1 –  – ) 1 –  d = – log ––––––––––––––––––  – log  –––––– (1 – ) (1 – 2 – ) 1 – 2 e = – log –––––––  – log  (1 – )The meaning of alignment scores•The Viterbi algorithm for Pair HMMs corresponds exactly to the Needleman-Wunsch algorithm with affine gaps•However, now we need to score alignment with parameters that add up to probability distributions 1/mean length of next gap 1/mean arrival time of next gapaffine gaps decouple arrival time with length 1/mean length of aligned sequences (set to ~0) 1/mean length of unaligned sequences (set to ~0)The meaning of alignment scoresMatch/mismatch scores: P(xi, yj) s(a, b)  log ––––––––––– (let’s ignore log(1 – 2 ) for the moment – assume no gaps) P(xi) P(yj)Example:Say DNA regions between human and mouse have average conservation of 50%Then P(A,A)

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-22-23 out of 23 pages.

Stanford CS 262 - Variants of HMMs

Sign up for free to view:

Please select your school