U of I CS 498 - Evolutionary Models - D2200238

Home> Schools> University of Illinois> Computer Science (CS) > CS 498> Evolutionary Models

DOC PREVIEW

U of I CS 498 - Evolutionary Models

School name University of Illinois

Course Cs 498- Special Topics

Pages 25

This preview shows page 1-2-24-25 out of 25 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Evolutionary ModelsCS 498 SSSaurabh SinhaModels of nucleotidesubstitution• The DNA that we study in bioinformatics is theend(??)-product of evolution• Evolution is a very complicated process• Very simplified models of this process can be studiedwithin a probabilistic framework• Allows testing of various hypotheses about theevolutionary process, from multi-species dataSource: Ewens and Grant, Chapter 14.Diversity in a population• There IS genetic variation between individuals in apopulation• But relatively little variation at nucl. level• E.g., two humans differ at the nucl. level at one in500 to 1000 nucls.• Roughly speaking, a single nucleotide dominates thepopulation at a particular position in the genomeSubstitution• Over long time periods, the nucleotide at agiven position remains the same• But periodically, this nucleotide changes (overthe entire population)• This is called “substitution”, i.e., replacementof the predominant nucl. for that position withanother predominant nucl.Markov Chain to modelsubstitution• Markov chain to describe the substitution process ata position• States are “a”, “c”, “g”, “t”• The chain “runs” in certain units of time, i.e., the statemay change from one time point to the next timepoint• The unit of time (difference between successive timepoints) may be arbitrary, e.g., 20000 generations.Markov Chain to modelsubstitution• A symbol such as “pag” is the probability of a change from “a” to“g” in one unit of time• When studying two extant species, the evolutionary model hasto provide the joint probability of the two species’ data• Sometimes, this is done by computing probability of theancestor, starting from one extant species, and then theprobability of the other extant species, starting from the ancestor• If we want to do this, the evolutionary process (model) must be“time reversible”: P(x)P(x->y) = P(y)P(y->x)Jukes Cantor Model• Markov chain with four states: a,c,g,t• Transition matrix P given by:1-3ααααtα1-3αααcαα1-3ααgααα1-3αatcgaJukes Cantor Model• α is a parameter depending on what a“time unit” means. If time unitrepresents more #generations, α will belarger• α must be less than 1/3 thoughJukes Cantor Model• Whatever the current nucl is, each ofthe other three nucls are equally likelyto substitute for itUnderstanding the J-C Model• Consider a transition matrix P, and aprobability vector v (a row vector)• What does w = v P represent ?• If v is the probability distribution of the 4 nucls(at a position) now, w is the prob. distr. at thenext time step.Understanding the J-C model• Suppose we can find a vector ϕ suchthat ϕ P = ϕ• If the probability distribution is ϕ, it willcontinue to remain ϕ at future times• This is called the stationary distributionof the Markov ChainUnderstanding the J-C model• Check that ϕ = (0.25, 0.25, 0.25, 0.25)satisfies ϕ P = ϕ• Therefore, if a position evolves as per thismodel, for long enough, it will be equally likelyto have any of the 4 nucls!• This is the very long term prediction, but canwe write down what the position will be as afunction of time (steps) ?Spectral Decomposition• Recall that we found a ϕ such that ϕ P = ϕ• Such a vector is called an “eigenvector” of P, and thecorresponding “eigenvalue” is 1.• In general, if v P = λ v (for scalar λ), λ is called aneigenvalue, and v is a left eigenvector of PSpectral decomposition• Similarly, if P uT = λ uT, then u is called a righteigenvector• In general, there may be multiple eigenvalues λj andtheir corresponding left and right eigenvectors vj and uj• We can write P as! P ="jujTvjj#Spectral decomposition• Then, for any positive integer, it is true that• Why is Pn interesting to us ?• Because it tells us what the probabilitydistribution will be after n time steps• If we started with v, then Pnv will be the prob.distr. after n steps! Pn="jnujTvjj#Back to the J-C model• We reasoned that ϕ = (.25,.25,.25,.25) is a lefteigenvector for the eigenvalue 1.• Actually, the J-C transition matrix has this eigenvalueand the eigenvalue (1-4α ), and if we do the math weget the spectral decomposition of P as:! Pn=.25 .25 .25 .25.25 .25 .25 .25.25 .25 .25 .25.25 .25 .25 .25" # $ $ $ $ % & ' ' ' ' + (1( 4))n.75 (.25 (.25 (.25(.25 .75 (.25 (.25(.25 (.25 .75 (.25(.25 (.25 (.25 .75" # $ $ $ $ % & ' ' ' 'Back to the J-C model• So, if we started with (1,0,0,0), i.e., an “a”, theprobability that we’ll see an “a” at that position after ntime steps is:0.25+0.75(1-4α)n• And the probability that the “a” would have mutated tosay “c” is:0.25 - 0.25(1-4α)nSubstitution probability• As a function of time n, we therefore get• Pr(x -> y) = 0.25 + 0.75 (1-4α)n if x = y• and = 0.25 - 0.25 (1-4α )n otherwise• If n ->∞, we get back our (0.25, 0.25,0.25, 0.25) calculationMore advanced models• The J-C model made highly “symmetric” assumptions, in itsformulation of the transition matrix P• In reality, for example, “transitions” are more common than“transversions”– What are these? Purine = A or G. Pyrimidine = C or T. Transition issubstitution in the same category; transversion is substitution acrosscategories– Purines are similarly sized, and pyrimidines are similarly sized. Morelikely to be replaced by similar sized nucl.• The “Kimura” model captures thistransition/transversion biasKimura model1-α-2βαββtα1-α-2βββcββ1-α-2βαgββα1-α-2βatcga• This of course is the transition probability matrix P of the Markov chain• Two parameters now, instead of one.Kimura model• Again, one of the eigenvalues is 1, and the lefteigenvector corresponding to it is ϕ =(.25,.25,.25,.25)• So again, the stationary distribution is uniform• P(x -> x) = .25+.25(1-4β)n+.5(1-2(α +β))n• P(x -> y) = .25+.25(1-4β)n+.5(1-2(α +β))n if x is apurine and y is the other purineEven more advanced models• Get to greater levels of realism• Kimura model still has a uniform stationarydistribution, which is not true of real data• One extension: purine to pyrimidine subst.prob. is different from pyrimidine to purinesubst. prob.– This leads to a non-uniform stationary probabilityFelsenstein models1-u+uϕtuϕcuϕguϕatuϕt1-u+uϕcuϕguϕacuϕtuϕc1-u+uϕguϕaguϕtuϕcuϕg1-u+uϕaatcgaTransition probability proportional to the stationary probability of the target

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-24-25 out of 25 pages.

U of I CS 498 - Evolutionary Models

Sign up for free to view:

Please select your school