CMU BSC 03510 - Lecture - D2687904

Home> Schools> Carnegie Mellon University> Biological Sciences (BSC) > BSC 03510> Lecture

DOC PREVIEW

CMU BSC 03510 - Lecture

School name Carnegie Mellon University

Course Bsc 03510-

Pages 35

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Computational Biology, Part 5 Hidden Markov ModelsMarkov chainsFormalism for Markov chainsGenerating Markov chainsInteractive DemonstrationMatlab code for generating Markov chainsSlide 7Slide 8Discriminating between two states with Markov chainsState probablities for + and - modelsTransition probabilities converted to log likelihood ratiosExampleSlide 13Hidden Markov modelsMore definitionsGenerating sequences (see previous example code)DecodingSlide 18Determining the optimal path: the Viterbi algorithmSlide 20Block Diagram for Viterbi AlgorithmMultiple paths can give the same sequenceProbability of a sequenceSlide 24Forward algorithmBackward algorithmSlide 27Estimating probability of state at particular positionParameter estimation for HMMsEstimation when state sequence knownBaum-WelchEstimating transition frequenciesEstimating emission frequenciesUpdating model parametersTest for terminationComputational Biology, Part 5Hidden Markov ModelsComputational Biology, Part 5Hidden Markov ModelsRobert F. MurphyRobert F. MurphyCopyright Copyright  2005-2009. 2005-2009.All rights reserved.All rights reserved.Markov chainsMarkov chainsIf we can predict all of the properties of a If we can predict all of the properties of a sequence knowing only the conditional sequence knowing only the conditional dinucleotide probabilities, then that dinucleotide probabilities, then that sequence is an example of a sequence is an example of a Markov chainMarkov chainA A Markov chainMarkov chain is defined as a sequence of is defined as a sequence of states in which each state depends only on states in which each state depends only on the previous statethe previous stateFormalism for Markov chainsFormalism for Markov chainsMM=(=(Q,π,PQ,π,P) is a Markov chain, where) is a Markov chain, whereQQ = vector (1,.., = vector (1,..,nn) is the list of states ) is the list of states QQ(1)=A, (1)=A, QQ(2)=C, (2)=C, QQ(3)=G, (3)=G, QQ(4)=T for DNA(4)=T for DNAππ = vector (p = vector (p11,..,p,..,pnn) is the initial probability of each state) is the initial probability of each stateππ((ii)=pQ)=pQ((ii) ) (e,g., π(1)=p (e,g., π(1)=pA A for DNA)for DNA)PP= = nn x x nn matrix where the entry in row matrix where the entry in row i i and column and column jj is is the probability of observing state the probability of observing state jj if the previous state is if the previous state is i i and the sum of entries in each row is 1 (and the sum of entries in each row is 1 ( dinucleotide dinucleotide probabilities) probabilities) PP(i,j)=p*(i,j)=p*Q(i)Q(i) Q(i)Q(i) (e.g., (e.g., PP(1,2)=p*(1,2)=p*ACAC for DNA) for DNA)Generating Markov chainsGenerating Markov chainsGiven Given Q,π,PQ,π,P (and a random number generator), we (and a random number generator), we can generate sequences that are members of the can generate sequences that are members of the Markov chain MMarkov chain MIf If π,Pπ,P are derived from a single sequence, the are derived from a single sequence, the family of sequences generated by family of sequences generated by MM will include will include that sequence as well as many othersthat sequence as well as many othersIf If π,Pπ,P are derived from a sampled set of are derived from a sampled set of sequences, the family of sequences generated by sequences, the family of sequences generated by MM will be the population from which that set has will be the population from which that set has been sampledbeen sampledInteractive DemonstrationInteractive Demonstration(A11 Markov chains)(A11 Markov chains)Matlab code for generating Markov chainsMatlab code for generating Markov chainschars = ['a' 'c' 'g' 't'];chars = ['a' 'c' 'g' 't']; % the dinucs array shows the frequency of observing the character in the % the dinucs array shows the frequency of observing the character in the % row followed by the character in the column% row followed by the character in the column% these values show strong preference for c-c% these values show strong preference for c-cdinucs = [2, 1, 2, 0; 0, 8, 0, 1; 2, 0, 2, 0; 1, 0, 0, 1];dinucs = [2, 1, 2, 0; 0, 8, 0, 1; 2, 0, 2, 0; 1, 0, 0, 1];% these values restrict transitions more% these values restrict transitions more%dinucs = [2, 0, 2, 0; 0, 8, 0, 0; 2, 0, 2, 0; 1, 1, 0, 1];%dinucs = [2, 0, 2, 0; 0, 8, 0, 0; 2, 0, 2, 0; 1, 1, 0, 1]; % calculate mononucleotide frequencies only as the probability of% calculate mononucleotide frequencies only as the probability of% starting with each nucleotide% starting with each nucleotidemonocounts = sum(dinucs,2);monocounts = sum(dinucs,2);monofreqs = monocounts/sum(monocounts);monofreqs = monocounts/sum(monocounts);cmonofreqs = cumsum(monofreqs);cmonofreqs = cumsum(monofreqs);Matlab code for generating Markov chainsMatlab code for generating Markov chains% calculate dinucleotide frequencies and cumulative dinuc freqs% calculate dinucleotide frequencies and cumulative dinuc freqsfreqs = dinucs./repmat(monocounts,1,4);freqs = dinucs./repmat(monocounts,1,4);cfreqs = cumsum(freqs,2);cfreqs = cumsum(freqs,2); disp('Dinucleotide frequencies (transition probabilities)');disp('Dinucleotide frequencies (transition probabilities)');fprintf(' %c %c %c %c\n',chars)fprintf(' %c %c %c %c\n',chars)for i=1:4for i=1:4 fprintf('%c %f %f %f %f\n',chars(i),freqs(i,:))fprintf('%c %f %f %f %f\n',chars(i),freqs(i,:))endendMatlab code for generating Markov chainsMatlab code for generating Markov chainsnseq = 10;nseq = 10;for ntries=1:20for ntries=1:20 rnums = rand(nseq,1);rnums = rand(nseq,1); % start sequence using mononucleotide frequencies% start sequence using mononucleotide frequencies seq(1) = min(find(cmonofreqs>=rnums(1)));seq(1) = min(find(cmonofreqs>=rnums(1))); for i=2:nseqfor i=2:nseq % extend it using the appropriate row from the dinuc freqs% extend it using the appropriate row from the dinuc freqs seq(i) = min(find(cfreqs(seq(i-1),:)>=rnums(i)));seq(i) = min(find(cfreqs(seq(i-1),:)>=rnums(i))); endend output=chars(seq);output=chars(seq); disp(strvcat(output));disp(strvcat(output));endendDiscriminating between two states with Markov chainsDiscriminating between two states with Markov chainsTo determine which of two states a sequence To determine which of two states a sequence is more likely to have resulted from, we is more likely to have resulted from, we calculatecalculate€ S(x) = logP(x | model+)P(x | model-)= logaxi

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

CMU BSC 03510 - Lecture

Sign up for free to view:

Please select your school