Hidden Markov ModelsDecodingThe Viterbi AlgorithmEvaluationViterbi, Forward, BackwardPosterior DecodingImplementation TechniquesA modeling ExampleExample: CpG IslandsSlide 10A model of CpG Islands – (1) ArchitectureA model of CpG Islands – (2) TransitionsParenthesis – log likelihoodsSlide 14Slide 15Applications of the modelWhat if a new genome comes?Problem 3: LearningTwo learning scenarios1. When the right answer is knownSlide 21PseudocountsSlide 232. When the right answer is unknownSlide 25Estimating new parametersSlide 27Slide 28The Baum-Welch AlgorithmThe Baum-Welch Algorithm – commentsAlternative: Viterbi TrainingHidden Markov ModelsDecodingGIVEN x = x1x2……xNWe want to find = 1, ……, N,such that P[ x, ] is maximized* = argmax P[ x, ]We can use dynamic programming!Let Vk(i) = max{1,…,i-1} P[x1…xi-1, 1, …, i-1, xi, i = k] = Probability of most likely sequence of states ending at state i = k12K…12K…12K…………12K…x1x2x3xK21K2The Viterbi AlgorithmSimilar to “aligning” a set of states to a sequenceTime:O(K2N)Space:O(KN)x1 x2 x3 ………………………………………..xNState 12KVj(i)EvaluationWe demonstrated algorithms that allow us to compute:P(x) Probability of x given the modelP(xi…xj) Probability of a substring of x given the modelP(I = k | x) Probability that the ith state is k, given xA more refined measure of which states x may be inViterbi, Forward, BackwardVITERBIInitialization:V0(0) = 1Vk(0) = 0, for all k > 0Iteration: Vj(i) = ej(xi) maxk Vk(i-1) akj Termination: P(x, *) = maxk Vk(N)FORWARDInitialization:f0(0) = 1fk(0) = 0, for all k > 0Iteration:fl(i) = el(xi) k fk(i-1) aklTermination:P(x) = k fk(N) ak0BACKWARDInitialization:bk(N) = ak0, for all kIteration:bk(i) = l el(xi+1) akl bl(i+1)Termination: P(x) = l a0l el(x1) bl(1)Posterior DecodingWe can now calculatefk(i) bk(i)P(i = k | x) = ––––––– P(x)Then, we can askWhat is the most likely state at position i of sequence x:Define ^ by Posterior Decoding: ^i = argmaxk P(i = k | x)Implementation TechniquesViterbi: Sum-of-logsVl(i) = log ek(xi) + maxk [ Vk(i-1) + log akl ]Forward/Backward: Scaling by c(i)One way to perform scaling:fl(i) = c(i) [el(xi) k fk(i-1) akl] where c(i) = 1/( k fk(i))bl(i): use the same factors c(i)Details in Rabiner’s Tutorial on HMMs, 1989A+ C+ G+ T+A- C- G- T-A modeling ExampleCpG islands in DNA sequencesExample: CpG IslandsCpG nucleotides in the genome are frequently methylated(Write CpG not to confuse with CG base pair)C methyl-C TMethylation often suppressed around genes, promoters CpG islandsExample: CpG Islands•In CpG islands, CG is more frequentOther pairs (AA, AG, AT…) have different frequenciesQuestion: Detect CpG islands computationallyA model of CpG Islands – (1) ArchitectureA+ C+ G+ T+A- C- G- T-CpG IslandNot CpG IslandA model of CpG Islands – (2) TransitionsHow do we estimate the parameters of the model?Emission probabilities: 1/01. Transition probabilities within CpG islandsEstablished from many known (experimentally verified) CpG islands (Training Set)2. Transition probabilities within other regionsEstablished from many known non-CpG islands+A C G TA.180 .274 .426 .120C.171 .368 .274 .188G.161 .339 .375 .125T.079 .355 .384 .182-A C G TA.300 .205 .285 .210C.233 .298 .078 .302G.248 .246 .298 .208T.177 .239 .292 .292Parenthesis – log likelihoodsA C G TA-0.740 +0.419 +0.580 -0.803C-0.913 +0.302 +1.812 -0.685G-0.624 +0.461 +0.331 -0.730T-1.169 +0.573 +0.393 -0.679Another way to see effects of transitions:Log likelihoodsL(u, v) = log[ P(uv | + ) / P(uv | -) ]Given a region x = x1…xNA quick-&-dirty way to decide whether entire x is CpGP(x is CpG) > P(x is not CpG) i L(xi, xi+1) > 0A model of CpG Islands – (2) Transitions•What about transitions between (+) and (-) states?•They affect Avg. length of CpG islandAvg. separation between two CpG islandsX Y1-p1-qp qLength distribution of region X:P[lX = 1] = 1-pP[lX = 2] = p(1-p)…P[lX= k] = pk(1-p)E[lX] = 1/(1-p)Geometric distribution, with mean 1/(1-p)A model of CpG Islands – (2) TransitionsNo reason to favor exiting/entering (+) and (-) regions at a particular nucleotideTo determine transition probabilities between (+) and (-) states1. Estimate average length of a CpG island: lCPG = 1/(1-p) p = 1 – 1/lCPG 2. For each pair of (+) states k, l, let akl p × akl3. For each (+) state k, (-) state l, let akl = (1-p)/4(better: take frequency of l in the (-) regions into account)4. Do the same for (-) statesA problem with this model: CpG islands don’t have exponential length distributionThis is a defect of HMMs – compensated with ease of analysis & computationApplications of the modelGiven a DNA region x,The Viterbi algorithm predicts locations of CpG islandsGiven a nucleotide xi, (say xi = A)The Viterbi parse tells whether xi is in a CpG island in the most likely general scenarioThe Forward/Backward algorithms can calculateP(xi is in CpG island) = P(i = A+ | x)Posterior Decoding can assign locally optimal predictions of CpG islands ^i = argmaxk P(i = k | x)What if a new genome comes?•We just sequenced the porcupine genome•We know CpG islands play the same role in this genome•However, we have no known CpG islands for porcupines•We suspect the frequency and characteristics of CpG islands are quite different in porcupinesHow do we adjust the parameters in our model?LEARNINGProblem 3: LearningRe-estimate the parameters of the model based on training dataTwo learning scenarios1. Estimation when the “right answer” is knownExamples: GIVEN: a genomic region x = x1…x1,000,000 where we have good (experimental) annotations of the CpG islandsGIVEN: the casino player allows us to observe him one evening, as he changes dice and produces 10,000 rolls2. Estimation when the “right answer” is unknownExamples:GIVEN: the porcupine genome; we don’t know how frequent are the CpG islands there, neither do we know their compositionGIVEN: 10,000 rolls of the casino player, but we don’t see when he changes diceQUESTION: Update the parameters of the model to maximize P(x|)1. When the right answer is knownGiven x = x1…xNfor which the true = 1…N is known,Define:Akl = # times kl transition occurs in Ek(b) = # times state k in emits b in xWe can show that the maximum likelihood
View Full Document