DOC PREVIEW
CMU BSC 03510 - LecturesPart05
Pages 31

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Computational Biology, Part 5 Hidden Markov ModelsMarkov chainsFormalism for Markov chainsGenerating Markov chainsInteractive DemonstrationDiscriminating between two states with Markov chainsState probablities for + and - modelsTransition probabilities converted to log likelihood ratiosExampleSlide 10Hidden Markov modelsMore definitionsDecodingSlide 14Determining the optimal path: the Viterbi algorithmSlide 16Block Diagram for Viterbi AlgorithmMultiple paths can give the same sequenceProbability of a sequenceSlide 20Forward algorithmBackward algorithmSlide 23Estimating probability of state at particular positionParameter estimation for HMMsEstimation when state sequence knownBaum-WelchEstimating transition frequenciesEstimating emission frequenciesUpdating model parametersTest for terminationComputational Biology, Part 5Hidden Markov ModelsComputational Biology, Part 5Hidden Markov ModelsRobert F. MurphyRobert F. MurphyCopyright Copyright  2005-2008. 2005-2008.All rights reserved.All rights reserved.Markov chainsMarkov chainsIf we can predict all of the properties of a If we can predict all of the properties of a sequence knowing only the conditional sequence knowing only the conditional dinucleotide probabilities, then that dinucleotide probabilities, then that sequence is an example of a sequence is an example of a Markov chainMarkov chainA A Markov chainMarkov chain is defined as a sequence of is defined as a sequence of states in which each state depends only on states in which each state depends only on the previous statethe previous stateFormalism for Markov chainsFormalism for Markov chainsMM=(=(Q,π,PQ,π,P) is a Markov chain, where) is a Markov chain, whereQQ = vector (1,.., = vector (1,..,nn) is the list of states ) is the list of states QQ(1)=A, (1)=A, QQ(2)=C, (2)=C, QQ(3)=G, (3)=G, QQ(4)=T for DNA(4)=T for DNAππ = vector (p = vector (p11,..,p,..,pnn) is the initial probability of each state) is the initial probability of each stateππ((ii)=pQ)=pQ((ii) ) (e,g., π(1)=p (e,g., π(1)=pA A for DNA)for DNA)PP= = nn x x nn matrix where the entry in row matrix where the entry in row i i and column and column jj is is the probability of observing state the probability of observing state jj if the previous state is if the previous state is i i and the sum of entries in each row is 1 (and the sum of entries in each row is 1 ( dinucleotide dinucleotide probabilities) probabilities) PP(i,j)=p*(i,j)=p*Q(i)Q(i) Q(i)Q(i) (e.g., (e.g., PP(1,2)=p*(1,2)=p*ACAC for DNA) for DNA)Generating Markov chainsGenerating Markov chainsGiven Given Q,π,PQ,π,P (and a random number generator), we (and a random number generator), we can generate sequences that are members of the can generate sequences that are members of the Markov chain MMarkov chain MIf If π,Pπ,P are derived from a single sequence, the are derived from a single sequence, the family of sequences generated by family of sequences generated by MM will include will include that sequence as well as many othersthat sequence as well as many othersIf If π,Pπ,P are derived from a sampled set of are derived from a sampled set of sequences, the family of sequences generated by sequences, the family of sequences generated by MM will be the population from which that set has will be the population from which that set has been sampledbeen sampledInteractive DemonstrationInteractive Demonstration(A11 Markov chains)(A11 Markov chains)Demonstration: Generating Markov chainsAuthor: R.F. Murphy, Jan. 14, 2004; Revised Feb. 1, 2007This demonstration illustrates how mononucleotide anddinucleotide frequencies can be used to predicta sequence that in the limit of large length has the same frequenciesInstructions: Enter numbers of dinucleotides in shaded cellsObserve sequences generated belowMononucleotide calculations ttaaggaaggacccccccccc (assumed circular)total mono sum expressed as a frequency cumulativenucleotide a c g t a c g t a c g t5 9 4 2 20 0.25 0.45 0.20 0.10 0 0.25 0.70 0.90 1.00Dinucleotide calculationsfirst nucleotide in row, second nucleotide in columntotal dinucs sum expressed as a frequency cumulativenucleotide a c g t a c g t a c g ta2 1 2 0 5 0.40 0.20 0.40 0.00 0 0.40 0.60 1.00 1.00c0 8 0 1 9 0.00 0.89 0.00 0.11 0 0.00 0.89 0.89 1.00g2 0 2 0 4 0.50 0.00 0.50 0.00 0 0.50 0.50 1.00 1.00t1 0 0 1 2 0.50 0.00 0.00 0.50 0 0.50 0.50 0.50 1.00totals 5 9 4 2Markov chain generationrandom number 0.9 0.67 0.87 0.85 0.31 0.09 0.67 0.53 0.13 0.8 0.35 0 0.01 0.02 0.57 0.16nucleotide index 3 3 3 3 1 1 3 3 1 3 1 1 1 1 2 2output sequence g g g g a a g g a g a a a a c cDiscriminating between two states with Markov chainsDiscriminating between two states with Markov chainsTo determine which of two states a sequence To determine which of two states a sequence is more likely to have resulted from, we is more likely to have resulted from, we calculatecalculate€ S(x) = logP(x | model+)P(x | model-)= logaxi −1xi+axi −1xi−i=1L∑S(x) = βxi −1xii=1L∑State probablities for + and - modelsState probablities for + and - modelsGiven examples sequences that are from Given examples sequences that are from either + model (CpG island) or - model (not either + model (CpG island) or - model (not CpG island), can calculate the probability CpG island), can calculate the probability that each nucleotide will occur for each that each nucleotide will occur for each model (the model (the aa values for each model) values for each model)+ A C G T - A C G T+ A C G T - A C G TA 0.180 0.274 0.426 0.120 A 0.300 0.205 0.285 0.210A 0.180 0.274 0.426 0.120 A 0.300 0.205 0.285 0.210C 0.171 0.368 0.274 0.188 C 0.322 0.298 0.078 0.302C 0.171 0.368 0.274 0.188 C 0.322 0.298 0.078 0.302G 0.161 0.339 0.375 0.125 G 0.248 0.246 0.298 0.208G 0.161 0.339 0.375 0.125 G 0.248 0.246 0.298 0.208T 0.079 0.355 0.384 0.182 T 0.177 0.239 0.292 0.292T 0.079 0.355 0.384 0.182 T 0.177 0.239 0.292 0.292Transition probabilities converted to log likelihood ratiosTransition probabilities converted to log likelihood ratiosßßAACCGGTTAA-0.740-0.7400.4190.4190.5800.580-0.803-0.803CC-0.913-0.9130.3020.3021.8121.812-0.685-0.685GG-0.624-0.6240.4610.4610.3310.331-0.730-0.730TT-1.169-1.1690.5730.5730.3930.393-0.679-0.679ExampleExampleWhat is relative probability of C+G+C+ What is relative probability of C+G+C+ compared with C-G-C-?compared with


View Full Document

CMU BSC 03510 - LecturesPart05

Download LecturesPart05
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LecturesPart05 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LecturesPart05 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?