DOC PREVIEW
Stanford CS 262 - Hidden Markov Models

This preview shows page 1-2-17-18-19-35-36 out of 36 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Hidden Markov ModelsGenerating a sequence by an HMMViterbi, Forward, BackwardA modeling ExampleMethylation & SilencingExample: CpG IslandsSlide 7A model of CpG Islands – (1) ArchitectureA model of CpG Islands – (2) TransitionsLog Likehoods— Telling “CpG Island” from “Non-CpG Island”Slide 11Slide 12Applications of the modelWhat if a new genome comes?Problem 3: LearningTwo learning scenarios1. When the right answer is knownSlide 18PseudocountsSlide 202. When the right answer is unknownSlide 22Estimating new parametersSlide 24The Baum-Welch AlgorithmSlide 26Alternative: Viterbi TrainingVariants of HMMsHigher-order HMMsModeling the Duration of StatesSlide 31Solution 1: Chain several statesSolution 2: Negative binomial distributionExample: genes in prokaryotesSolution 3: Duration modelingViterbi with duration modelingHidden Markov Models12K…12K…12K…………12K…x1x2x3xK21K2CS262 Lecture 6, Win07, BatzoglouGenerating a sequence by an HMMA HMM is a generative modelgenerative model1. Start at state 1 according to prob a01 2. Emit letter x1 according to prob e1(x1)3. Go to state 2 according to prob a124. … until emitting xn 12K…12K…12K…………12K…x1x2x3xn21K20e2(x1)a02CS262 Lecture 6, Win07, BatzoglouViterbi, Forward, BackwardVITERBIInitialization:V0(0) = 1Vk(0) = 0, for all k > 0Iteration: Vl(i) = el(xi) maxk Vk(i-1) akl Termination: P(x, *) = maxk Vk(N)FORWARDInitialization:f0(0) = 1fk(0) = 0, for all k > 0Iteration:fl(i) = el(xi) k fk(i-1) aklTermination:P(x) = k fk(N) ak0BACKWARDInitialization:bk(N) = ak0, for all kIteration:bl(i) = k el(xi+1) akl bk(i+1)Termination: P(x) = k a0k ek(x1) bk(1)CS262 Lecture 6, Win07, BatzoglouA+ C+ G+ T+A- C- G- T-A modeling ExampleCpG islands in DNA sequencesCS262 Lecture 6, Win07, BatzoglouMethylation & Silencing•One way cells differentiate is methylationAddition of CH3in C-nucleotidesSilences genes in region•CG (denoted CpG) often mutates to TG, when methylated•In each cell, one copy of X is silenced, methylation plays role•Methylation is inherited during cell divisionCS262 Lecture 6, Win07, BatzoglouExample: CpG IslandsCpG nucleotides in the genome are frequently methylated(Write CpG not to confuse with CG base pair)C  methyl-C  TMethylation often suppressed around genes, promoters CpG islandsCS262 Lecture 6, Win07, BatzoglouExample: CpG Islands•In CpG islands, CG is more frequentOther pairs (AA, AG, AT…) have different frequenciesQuestion: Detect CpG islands computationallyCS262 Lecture 6, Win07, BatzoglouA model of CpG Islands – (1) ArchitectureA+ C+ G+ T+A- C- G- T-CpG IslandNot CpG IslandCS262 Lecture 6, Win07, BatzoglouA model of CpG Islands – (2) TransitionsHow do we estimate parameters of the model?Emission probabilities: 1/01. Transition probabilities within CpG islandsEstablished from known CpG islands (Training Set)2. Transition probabilities within other regionsEstablished from known non-CpG islands(Training Set)Note: these transitions out of each state add up to one—no room for transitions between (+) and (-) states+A C G TA.180 .274 .426 .120C.171 .368 .274 .188G.161 .339 .375 .125T.079 .355 .384 .182-A C G TA.300 .205 .285 .210C.233 .298 .078 .302G.248 .246 .298 .208T.177 .239 .292 .292= 1= 1= 1= 1= 1= 1= 1= 1CS262 Lecture 6, Win07, BatzoglouLog Likehoods—Telling “CpG Island” from “Non-CpG Island”A C G TA-0.740 +0.419 +0.580 -0.803C-0.913 +0.302 +1.812 -0.685G-0.624 +0.461 +0.331 -0.730T-1.169 +0.573 +0.393 -0.679Another way to see effects of transitions:Log likelihoodsL(u, v) = log[ P(uv | + ) / P(uv | -) ]Given a region x = x1…xNA quick-&-dirty way to decide whether entire x is CpGP(x is CpG) > P(x is not CpG) i L(xi, xi+1) > 0CS262 Lecture 6, Win07, BatzoglouA model of CpG Islands – (2) Transitions•What about transitions between (+) and (-) states?•They affect Avg. length of CpG islandAvg. separation between two CpG islandsX Y1-p1-qp qLength distribution of region X:P[lX = 1] = 1-pP[lX = 2] = p(1-p)…P[lX= k] = pk-1(1-p)E[lX] = 1/(1-p)Geometric distribution, with mean 1/(1-p)CS262 Lecture 6, Win07, BatzoglouA+ C+ G+ T+A- C- G- T-1–p++A model of CpG Islands – (2) TransitionsRight now, aA+A+ + aA+C+ + aA+G+ + aA+T+ = 1We need to adjust aij so as to allow transitions between (+) and (-) statesSay we want with probability p++ to stay within CpG, p-- to stay within non-CpG1. Let’s adjust all probs by that factor: example, let aA+G+  p++ × aA+G+2. Now, let’s calculate probs between (+) and (-) states1. Total prob aA+S where S is a (-) state, is (1 – p++)2. Let qA-, qC-, qG-, qT- be the proportions of A, C, G, and T within non-CpG states in training set3. Then, let aA+A- = (1 – p++) × qA-; aA+C- = (1 – p++) × qC-; …4. Do the same for (-) to (+) states3. OK, but how do we estimate p++ and p--?1. Estimate average length of a CpG island: l+ = 1/(1-p)  p = 1 – 1/l+ 2. Do the same for length between two CpG islands, l-What we just did is a back-of the envelope learninglearning procedureWe adjusted the parameters in a manner similar to the learning algorithms we will cover in this lectureCS262 Lecture 6, Win07, BatzoglouApplications of the modelGiven a DNA region x,The Viterbi algorithm predicts locations of CpG islandsGiven a nucleotide xi, (say xi = A)The Viterbi parse tells whether xi is in a CpG island in the most likely general scenarioThe Forward/Backward algorithms can calculateP(xi is in CpG island) = P(i = A+ | x)Posterior Decoding can assign locally optimal predictions of CpG islands ^i = argmaxk P(i = k | x)Advantage: ?Each nucleotide is more likely to be called correctlyDisadvantage: ?The overall parse will be “choppy”—CpG islands too shortAdvantage/Disadvantage?CS262 Lecture 6, Win07, BatzoglouWhat if a new genome comes?•We just sequenced the porcupine genome•We know CpG islands play the same role in this genome•However, we have no known CpG islands for porcupines•We suspect the frequency and characteristics of CpG islands are quite different in porcupinesHow do we adjust the parameters in our model?LEARNINGCS262 Lecture 6, Win07, BatzoglouProblem 3: LearningRe-estimate the parameters of the model based on training dataCS262 Lecture 6, Win07, BatzoglouTwo learning scenarios1. Estimation when the “right answer” is knownExamples: GIVEN: a genomic region x = x1…x1,000,000 where we have good (experimental) annotations of the CpG


View Full Document

Stanford CS 262 - Hidden Markov Models

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Hidden Markov Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hidden Markov Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hidden Markov Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?