Computational Biology, Part 5 Hidden Markov ModelsMarkov chainsFormalism for Markov chainsGenerating Markov chainsInteractive DemonstrationMatlab code for generating Markov chainsSlide 7Slide 8Discriminating between two states with Markov chainsState probablities for + and - modelsTransition probabilities converted to log likelihood ratiosExampleSlide 13Hidden Markov modelsMore definitionsGenerating sequences (see previous example code)DecodingSlide 18Determining the optimal path: the Viterbi algorithmSlide 20Block Diagram for Viterbi AlgorithmMultiple paths can give the same sequenceProbability of a sequenceSlide 24Forward algorithmBackward algorithmSlide 27Estimating probability of state at particular positionParameter estimation for HMMsEstimation when state sequence knownBaum-WelchEstimating transition frequenciesEstimating emission frequenciesUpdating model parametersTest for terminationComputational Biology, Part 5Hidden Markov ModelsComputational Biology, Part 5Hidden Markov ModelsRobert F. MurphyRobert F. MurphyCopyright Copyright 2005-2009. 2005-2009.All rights reserved.All rights reserved.Markov chainsMarkov chainsIf we can predict all of the properties of a If we can predict all of the properties of a sequence knowing only the conditional sequence knowing only the conditional dinucleotide probabilities, then that dinucleotide probabilities, then that sequence is an example of a sequence is an example of a Markov chainMarkov chainA A Markov chainMarkov chain is defined as a sequence of is defined as a sequence of states in which each state depends only on states in which each state depends only on the previous statethe previous stateFormalism for Markov chainsFormalism for Markov chainsMM=(=(Q,π,PQ,π,P) is a Markov chain, where) is a Markov chain, whereQQ = vector (1,.., = vector (1,..,nn) is the list of states ) is the list of states QQ(1)=A, (1)=A, QQ(2)=C, (2)=C, QQ(3)=G, (3)=G, QQ(4)=T for DNA(4)=T for DNAππ = vector (p = vector (p11,..,p,..,pnn) is the initial probability of each state) is the initial probability of each stateππ((ii)=pQ)=pQ((ii) ) (e,g., π(1)=p (e,g., π(1)=pA A for DNA)for DNA)PP= = nn x x nn matrix where the entry in row matrix where the entry in row i i and column and column jj is is the probability of observing state the probability of observing state jj if the previous state is if the previous state is i i and the sum of entries in each row is 1 (and the sum of entries in each row is 1 ( dinucleotide dinucleotide probabilities) probabilities) PP(i,j)=p*(i,j)=p*Q(i)Q(i) Q(i)Q(i) (e.g., (e.g., PP(1,2)=p*(1,2)=p*ACAC for DNA) for DNA)Generating Markov chainsGenerating Markov chainsGiven Given Q,π,PQ,π,P (and a random number generator), we (and a random number generator), we can generate sequences that are members of the can generate sequences that are members of the Markov chain MMarkov chain MIf If π,Pπ,P are derived from a single sequence, the are derived from a single sequence, the family of sequences generated by family of sequences generated by MM will include will include that sequence as well as many othersthat sequence as well as many othersIf If π,Pπ,P are derived from a sampled set of are derived from a sampled set of sequences, the family of sequences generated by sequences, the family of sequences generated by MM will be the population from which that set has will be the population from which that set has been sampledbeen sampledInteractive DemonstrationInteractive Demonstration(A11 Markov chains)(A11 Markov chains)Matlab code for generating Markov chainsMatlab code for generating Markov chainschars = ['a' 'c' 'g' 't'];chars = ['a' 'c' 'g' 't']; % the dinucs array shows the frequency of observing the character in the % the dinucs array shows the frequency of observing the character in the % row followed by the character in the column% row followed by the character in the column% these values show strong preference for c-c% these values show strong preference for c-cdinucs = [2, 1, 2, 0; 0, 8, 0, 1; 2, 0, 2, 0; 1, 0, 0, 1];dinucs = [2, 1, 2, 0; 0, 8, 0, 1; 2, 0, 2, 0; 1, 0, 0, 1];% these values restrict transitions more% these values restrict transitions more%dinucs = [2, 0, 2, 0; 0, 8, 0, 0; 2, 0, 2, 0; 1, 1, 0, 1];%dinucs = [2, 0, 2, 0; 0, 8, 0, 0; 2, 0, 2, 0; 1, 1, 0, 1]; % calculate mononucleotide frequencies only as the probability of% calculate mononucleotide frequencies only as the probability of% starting with each nucleotide% starting with each nucleotidemonocounts = sum(dinucs,2);monocounts = sum(dinucs,2);monofreqs = monocounts/sum(monocounts);monofreqs = monocounts/sum(monocounts);cmonofreqs = cumsum(monofreqs);cmonofreqs = cumsum(monofreqs);Matlab code for generating Markov chainsMatlab code for generating Markov chains% calculate dinucleotide frequencies and cumulative dinuc freqs% calculate dinucleotide frequencies and cumulative dinuc freqsfreqs = dinucs./repmat(monocounts,1,4);freqs = dinucs./repmat(monocounts,1,4);cfreqs = cumsum(freqs,2);cfreqs = cumsum(freqs,2); disp('Dinucleotide frequencies (transition probabilities)');disp('Dinucleotide frequencies (transition probabilities)');fprintf(' %c %c %c %c\n',chars)fprintf(' %c %c %c %c\n',chars)for i=1:4for i=1:4 fprintf('%c %f %f %f %f\n',chars(i),freqs(i,:))fprintf('%c %f %f %f %f\n',chars(i),freqs(i,:))endendMatlab code for generating Markov chainsMatlab code for generating Markov chainsnseq = 10;nseq = 10;for ntries=1:20for ntries=1:20 rnums = rand(nseq,1);rnums = rand(nseq,1); % start sequence using mononucleotide frequencies% start sequence using mononucleotide frequencies seq(1) = min(find(cmonofreqs>=rnums(1)));seq(1) = min(find(cmonofreqs>=rnums(1))); for i=2:nseqfor i=2:nseq % extend it using the appropriate row from the dinuc freqs% extend it using the appropriate row from the dinuc freqs seq(i) = min(find(cfreqs(seq(i-1),:)>=rnums(i)));seq(i) = min(find(cfreqs(seq(i-1),:)>=rnums(i))); endend output=chars(seq);output=chars(seq); disp(strvcat(output));disp(strvcat(output));endendDiscriminating between two states with Markov chainsDiscriminating between two states with Markov chainsTo determine which of two states a sequence To determine which of two states a sequence is more likely to have resulted from, we is more likely to have resulted from, we calculatecalculate€ S(x) = logP(x | model+)P(x | model-)= logaxi
View Full Document