Computational Biology, Part 5 Hidden Markov ModelsMarkov chainsFormalism for Markov chainsGenerating Markov chainsInteractive DemonstrationDiscriminating between two states with Markov chainsState probablities for + and - modelsTransition probabilities converted to log likelihood ratiosExampleSlide 10Hidden Markov modelsMore definitionsDecodingSlide 14Determining the optimal path: the Viterbi algorithmSlide 16Block Diagram for Viterbi AlgorithmMultiple paths can give the same sequenceProbability of a sequenceSlide 20Forward algorithmBackward algorithmSlide 23Estimating probability of state at particular positionParameter estimation for HMMsEstimation when state sequence knownBaum-WelchEstimating transition frequenciesEstimating emission frequenciesUpdating model parametersTest for terminationComputational Biology, Part 5Hidden Markov ModelsComputational Biology, Part 5Hidden Markov ModelsRobert F. MurphyRobert F. MurphyCopyright Copyright 2005-2008. 2005-2008.All rights reserved.All rights reserved.Markov chainsMarkov chainsIf we can predict all of the properties of a If we can predict all of the properties of a sequence knowing only the conditional sequence knowing only the conditional dinucleotide probabilities, then that dinucleotide probabilities, then that sequence is an example of a sequence is an example of a Markov chainMarkov chainA A Markov chainMarkov chain is defined as a sequence of is defined as a sequence of states in which each state depends only on states in which each state depends only on the previous statethe previous stateFormalism for Markov chainsFormalism for Markov chainsMM=(=(Q,π,PQ,π,P) is a Markov chain, where) is a Markov chain, whereQQ = vector (1,.., = vector (1,..,nn) is the list of states ) is the list of states QQ(1)=A, (1)=A, QQ(2)=C, (2)=C, QQ(3)=G, (3)=G, QQ(4)=T for DNA(4)=T for DNAππ = vector (p = vector (p11,..,p,..,pnn) is the initial probability of each state) is the initial probability of each stateππ((ii)=pQ)=pQ((ii) ) (e,g., π(1)=p (e,g., π(1)=pA A for DNA)for DNA)PP= = nn x x nn matrix where the entry in row matrix where the entry in row i i and column and column jj is is the probability of observing state the probability of observing state jj if the previous state is if the previous state is i i and the sum of entries in each row is 1 (and the sum of entries in each row is 1 ( dinucleotide dinucleotide probabilities) probabilities) PP(i,j)=p*(i,j)=p*Q(i)Q(i) Q(i)Q(i) (e.g., (e.g., PP(1,2)=p*(1,2)=p*ACAC for DNA) for DNA)Generating Markov chainsGenerating Markov chainsGiven Given Q,π,PQ,π,P (and a random number generator), we (and a random number generator), we can generate sequences that are members of the can generate sequences that are members of the Markov chain MMarkov chain MIf If π,Pπ,P are derived from a single sequence, the are derived from a single sequence, the family of sequences generated by family of sequences generated by MM will include will include that sequence as well as many othersthat sequence as well as many othersIf If π,Pπ,P are derived from a sampled set of are derived from a sampled set of sequences, the family of sequences generated by sequences, the family of sequences generated by MM will be the population from which that set has will be the population from which that set has been sampledbeen sampledInteractive DemonstrationInteractive Demonstration(A11 Markov chains)(A11 Markov chains)Demonstration: Generating Markov chainsAuthor: R.F. Murphy, Jan. 14, 2004; Revised Feb. 1, 2007This demonstration illustrates how mononucleotide anddinucleotide frequencies can be used to predicta sequence that in the limit of large length has the same frequenciesInstructions: Enter numbers of dinucleotides in shaded cellsObserve sequences generated belowMononucleotide calculations ttaaggaaggacccccccccc (assumed circular)total mono sum expressed as a frequency cumulativenucleotide a c g t a c g t a c g t5 9 4 2 20 0.25 0.45 0.20 0.10 0 0.25 0.70 0.90 1.00Dinucleotide calculationsfirst nucleotide in row, second nucleotide in columntotal dinucs sum expressed as a frequency cumulativenucleotide a c g t a c g t a c g ta2 1 2 0 5 0.40 0.20 0.40 0.00 0 0.40 0.60 1.00 1.00c0 8 0 1 9 0.00 0.89 0.00 0.11 0 0.00 0.89 0.89 1.00g2 0 2 0 4 0.50 0.00 0.50 0.00 0 0.50 0.50 1.00 1.00t1 0 0 1 2 0.50 0.00 0.00 0.50 0 0.50 0.50 0.50 1.00totals 5 9 4 2Markov chain generationrandom number 0.9 0.67 0.87 0.85 0.31 0.09 0.67 0.53 0.13 0.8 0.35 0 0.01 0.02 0.57 0.16nucleotide index 3 3 3 3 1 1 3 3 1 3 1 1 1 1 2 2output sequence g g g g a a g g a g a a a a c cDiscriminating between two states with Markov chainsDiscriminating between two states with Markov chainsTo determine which of two states a sequence To determine which of two states a sequence is more likely to have resulted from, we is more likely to have resulted from, we calculatecalculate€ S(x) = logP(x | model+)P(x | model-)= logaxi −1xi+axi −1xi−i=1L∑S(x) = βxi −1xii=1L∑State probablities for + and - modelsState probablities for + and - modelsGiven examples sequences that are from Given examples sequences that are from either + model (CpG island) or - model (not either + model (CpG island) or - model (not CpG island), can calculate the probability CpG island), can calculate the probability that each nucleotide will occur for each that each nucleotide will occur for each model (the model (the aa values for each model) values for each model)+ A C G T - A C G T+ A C G T - A C G TA 0.180 0.274 0.426 0.120 A 0.300 0.205 0.285 0.210A 0.180 0.274 0.426 0.120 A 0.300 0.205 0.285 0.210C 0.171 0.368 0.274 0.188 C 0.322 0.298 0.078 0.302C 0.171 0.368 0.274 0.188 C 0.322 0.298 0.078 0.302G 0.161 0.339 0.375 0.125 G 0.248 0.246 0.298 0.208G 0.161 0.339 0.375 0.125 G 0.248 0.246 0.298 0.208T 0.079 0.355 0.384 0.182 T 0.177 0.239 0.292 0.292T 0.079 0.355 0.384 0.182 T 0.177 0.239 0.292 0.292Transition probabilities converted to log likelihood ratiosTransition probabilities converted to log likelihood ratiosßßAACCGGTTAA-0.740-0.7400.4190.4190.5800.580-0.803-0.803CC-0.913-0.9130.3020.3021.8121.812-0.685-0.685GG-0.624-0.6240.4610.4610.3310.331-0.730-0.730TT-1.169-1.1690.5730.5730.3930.393-0.679-0.679ExampleExampleWhat is relative probability of C+G+C+ What is relative probability of C+G+C+ compared with C-G-C-?compared with
View Full Document