DOC PREVIEW
Stanford CS 262 - Lecture 7

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Hidden Markov ModelsViterbi, Forward, BackwardPosterior DecodingSlide 4Slide 5Variants of HMMsHigher-order HMMsSimilar Algorithms to 1st OrderModeling the Duration of StatesSlide 10Solution 1: Chain several statesSolution 2: Negative binomial distributionExample: genes in prokaryotesSolution 3: Duration modelingViterbi with duration modelingProteins, Pair HMMs, and AlignmentA state model for alignmentLet’s score the transitionsAlignment with affine gaps – state versionSlide 20Brief introduction to the evolution of proteinsStructure Determines FunctionPrimary Structure: SequenceSlide 24Slide 25Secondary Structure: , , & loopsTertiary Structure: A Protein FoldActin structureActin sequenceA related protein in bacteriaRelation between sequence and structureProtein PhylogeniesProtein Phylogenies – ExamplePDB GrowthOnly a few folds are found in natureSubstitutions of Amino AcidsSubstitution MatricesProbabilistic interpretation of an alignmentA Pair HMM for alignmentsA Pair HMM for unaligned sequencesTo compare ALIGNMENT vs. RANDOM hypothesisSlide 42Slide 43The meaning of alignment scoresSlide 45Hidden Markov Models12K…12K…12K…………12K…x1x2x3xK21K2Viterbi, Forward, BackwardVITERBIInitialization:V0(0) = 1Vk(0) = 0, for all k > 0Iteration: Vl(i) = el(xi) maxk Vk(i-1) akl Termination: P(x, *) = maxk Vk(N)FORWARDInitialization:f0(0) = 1fk(0) = 0, for all k > 0Iteration:fl(i) = el(xi) k fk(i-1) aklTermination:P(x) = k fk(N)BACKWARDInitialization:bk(N) = 1, for all kIteration:bl(i) = k el(xi+1) akl bk(i+1)Termination: P(x) = k a0k ek(x1) bk(1)Posterior DecodingWe can now calculatefk(i) bk(i)P(i = k | x) = ––––––– P(x)Then, we can askWhat is the most likely state at position i of sequence x:Define ^ by Posterior Decoding: ^i = argmaxk P(i = k | x) P(i = k | x) = P(i = k , x)/P(x) = P(x1, …, xi, i = k, xi+1, … xn) / P(x) =P(x1, …, xi, i = k) P(xi+1, … xn | i = k) / P(x) =fk(i) bk(i) / P(x)Posterior Decoding•For each state, Posterior Decoding gives us a curve of likelihood of state for each positionThat is sometimes more informative than Viterbi path *•Posterior Decoding may give an invalid sequence of states (of prob 0)Why?Posterior Decoding•P(i = k | x) =  P( | x) 1(i = k) =  {:[i] = k} P( | x)x1 x2 x3 …………………………………………… xNState 1lP(i=l|x)k1() = 1, if  is true 0, otherwiseVariants of HMMsHigher-order HMMs•How do we model “memory” larger than one time point?•P(i+1 = l | i = k) akl•P(i+1 = l | i = k, i -1 = j) ajkl•…•A second order HMM with K states is equivalent to a first order HMM with K2 statesstate H state TaHT(prev = H)aHT(prev = T)aTH(prev = H)aTH(prev = T)state HH state HTstate TH state TTaHHTaTTHaHTTaTHHaTHTaHTHSimilar Algorithms to 1st Order•P(i+1 = l | i = k, i -1 = j)Vlk(i) = maxj{ Vkj(i – 1) + … }Time? Space?Modeling the Duration of StatesLength distribution of region X:E[lX] = 1/(1-p)•Geometric distribution, with mean 1/(1-p)This is a significant disadvantage of HMMsSeveral solutions exist for modeling different length distributionsX Y1-p1-qp qExample: exon lengths in genesSolution 1: Chain several statesX Y1-p1-qpqXXDisadvantage: Still very inflexible lX = C + geometric with mean 1/(1-p)Solution 2: Negative binomial distributionDuration in X: m turns, whereDuring first m – 1 turns, exactly n – 1 arrows to next state are followedDuring mth turn, an arrow to next state is followedm – 1 m – 1P(lX = m) = n – 1 (1 – p)n-1+1p(m-1)-(n-1) = n – 1 (1 – p)npm-nX(n)pX(2)X(1)p1 – p 1 – p p……Y1 – pExample: genes in prokaryotes•EasyGene:Prokaryoticgene-finderLarsen TS, Krogh A•Negative binomial with n = 3Solution 3: Duration modelingUpon entering a state:1. Choose duration d, according to probability distribution2. Generate d letters according to emission probs3. Take a transition to next state according to transition probsDisadvantage: Increase in complexity of Viterbi:Time: O(D)Space: O(1) where D = maximum duration of stateFd<Dfxi…xi+d-1PfWarning, Rabiner’s tutorial claims O(D2) & O(D) increasesViterbi with duration modelingRecall original iteration:Vl(i) = maxk Vk(i – 1) akl  el(xi) New iteration:Vl(i) = maxk maxd=1…Dl Vk(i – d)  Pl(d)  akl  j=i-d+1…iel(xj)F Ltransitionsemissionsd<Dfxi…xi + d – 1emissionsd<Dlxj…xj + d – 1PfPlPrecompute cumulative valuesProteins, Pair HMMs, and AlignmentA state model for alignment-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACCIMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIIIM(+1,+1)I(+1, 0)J(0, +1)Alignments correspond 1-to-1 with sequences of states M, I, JLet’s score the transitions-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACCIMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIIIM(+1,+1)I(+1, 0)J(0, +1)Alignments correspond 1-to-1 with sequences of states M, I, Js(xi, yj)s(xi, yj) s(xi, yj)-d -d-e -eAlignment with affine gaps – state versionDynamic Programming:M(i, j): Optimal alignment of x1…xi to y1…yj ending in MI(i, j): Optimal alignment of x1…xi to y1…yj ending in IJ(i, j): Optimal alignment of x1…xi to y1…yj ending in JThe score is additive, therefore we can apply DP recurrence formulasAlignment with affine gaps – state versionInitialization:M(0,0) = 0; M(i, 0) = M(0, j) = -, for i, j > 0I(i,0) = d + ie; J(0, j) = d + jeIteration:M(i – 1, j – 1)M(i, j) = s(xi, yj) + max I(i – 1, j – 1)J(i – 1, j – 1)e + I(i – 1, j)I(i, j) = maxd + M(i – 1, j)e + J(i, j – 1)J(i, j) = maxd + M(i, j – 1)Termination:Optimal alignment given by max { M(m, n), I(m, n), J(m, n) }Brief introduction to the evolution of proteins• Protein sequence and structure• Protein classification• Phylogeny trees• Substitution matricesStructure Determines FunctionWhat determines structure?•Energy•KinematicsHow can we determine structure?•Experimental methods•Computational predictionsThe Protein Folding ProblemPrimary Structure: Sequence•The primary structure of a protein is the amino acid sequencePrimary Structure: Sequence•Twenty different amino acids have distinct shapes and propertiesPrimary Structure: SequenceA useful mnemonic for the hydrophobic amino acids is "FAMILY VW"Secondary Structure: , , & loops


View Full Document

Stanford CS 262 - Lecture 7

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Load more
Download Lecture 7
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 7 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 7 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?