DOC PREVIEW
U of I CS 498 - Evolutionary Models

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Evolutionary ModelsCS 498 SSSaurabh SinhaModels of nucleotidesubstitution• The DNA that we study in bioinformatics is theend(??)-product of evolution• Evolution is a very complicated process• Very simplified models of this process can be studiedwithin a probabilistic framework• Allows testing of various hypotheses about theevolutionary process, from multi-species dataSource: Ewens and Grant, Chapter 14.Diversity in a population• There IS genetic variation between individuals in apopulation• But relatively little variation at nucl. level• E.g., two humans differ at the nucl. level at one in500 to 1000 nucls.• Roughly speaking, a single nucleotide dominates thepopulation at a particular position in the genomeSubstitution• Over long time periods, the nucleotide at agiven position remains the same• But periodically, this nucleotide changes (overthe entire population)• This is called “substitution”, i.e., replacementof the predominant nucl. for that position withanother predominant nucl.Markov Chain to modelsubstitution• Markov chain to describe the substitution process ata position• States are “a”, “c”, “g”, “t”• The chain “runs” in certain units of time, i.e., the statemay change from one time point to the next timepoint• The unit of time (difference between successive timepoints) may be arbitrary, e.g., 20000 generations.Markov Chain to modelsubstitution• A symbol such as “pag” is the probability of a change from “a” to“g” in one unit of time• When studying two extant species, the evolutionary model hasto provide the joint probability of the two species’ data• Sometimes, this is done by computing probability of theancestor, starting from one extant species, and then theprobability of the other extant species, starting from the ancestor• If we want to do this, the evolutionary process (model) must be“time reversible”: P(x)P(x->y) = P(y)P(y->x)Jukes Cantor Model• Markov chain with four states: a,c,g,t• Transition matrix P given by:1-3ααααtα1-3αααcαα1-3ααgααα1-3αatcgaJukes Cantor Model• α is a parameter depending on what a“time unit” means. If time unitrepresents more #generations, α will belarger• α must be less than 1/3 thoughJukes Cantor Model• Whatever the current nucl is, each ofthe other three nucls are equally likelyto substitute for itUnderstanding the J-C Model• Consider a transition matrix P, and aprobability vector v (a row vector)• What does w = v P represent ?• If v is the probability distribution of the 4 nucls(at a position) now, w is the prob. distr. at thenext time step.Understanding the J-C model• Suppose we can find a vector ϕ suchthat ϕ P = ϕ• If the probability distribution is ϕ, it willcontinue to remain ϕ at future times• This is called the stationary distributionof the Markov ChainUnderstanding the J-C model• Check that ϕ = (0.25, 0.25, 0.25, 0.25)satisfies ϕ P = ϕ• Therefore, if a position evolves as per thismodel, for long enough, it will be equally likelyto have any of the 4 nucls!• This is the very long term prediction, but canwe write down what the position will be as afunction of time (steps) ?Spectral Decomposition• Recall that we found a ϕ such that ϕ P = ϕ• Such a vector is called an “eigenvector” of P, and thecorresponding “eigenvalue” is 1.• In general, if v P = λ v (for scalar λ), λ is called aneigenvalue, and v is a left eigenvector of PSpectral decomposition• Similarly, if P uT = λ uT, then u is called a righteigenvector• In general, there may be multiple eigenvalues λj andtheir corresponding left and right eigenvectors vj and uj• We can write P as! P ="jujTvjj#Spectral decomposition• Then, for any positive integer, it is true that• Why is Pn interesting to us ?• Because it tells us what the probabilitydistribution will be after n time steps• If we started with v, then Pnv will be the prob.distr. after n steps! Pn="jnujTvjj#Back to the J-C model• We reasoned that ϕ = (.25,.25,.25,.25) is a lefteigenvector for the eigenvalue 1.• Actually, the J-C transition matrix has this eigenvalueand the eigenvalue (1-4α ), and if we do the math weget the spectral decomposition of P as:! Pn=.25 .25 .25 .25.25 .25 .25 .25.25 .25 .25 .25.25 .25 .25 .25" # $ $ $ $ % & ' ' ' ' + (1( 4))n.75 (.25 (.25 (.25(.25 .75 (.25 (.25(.25 (.25 .75 (.25(.25 (.25 (.25 .75" # $ $ $ $ % & ' ' ' 'Back to the J-C model• So, if we started with (1,0,0,0), i.e., an “a”, theprobability that we’ll see an “a” at that position after ntime steps is:0.25+0.75(1-4α)n• And the probability that the “a” would have mutated tosay “c” is:0.25 - 0.25(1-4α)nSubstitution probability• As a function of time n, we therefore get• Pr(x -> y) = 0.25 + 0.75 (1-4α)n if x = y• and = 0.25 - 0.25 (1-4α )n otherwise• If n ->∞, we get back our (0.25, 0.25,0.25, 0.25) calculationMore advanced models• The J-C model made highly “symmetric” assumptions, in itsformulation of the transition matrix P• In reality, for example, “transitions” are more common than“transversions”– What are these? Purine = A or G. Pyrimidine = C or T. Transition issubstitution in the same category; transversion is substitution acrosscategories– Purines are similarly sized, and pyrimidines are similarly sized. Morelikely to be replaced by similar sized nucl.• The “Kimura” model captures thistransition/transversion biasKimura model1-α-2βαββtα1-α-2βββcββ1-α-2βαgββα1-α-2βatcga• This of course is the transition probability matrix P of the Markov chain• Two parameters now, instead of one.Kimura model• Again, one of the eigenvalues is 1, and the lefteigenvector corresponding to it is ϕ =(.25,.25,.25,.25)• So again, the stationary distribution is uniform• P(x -> x) = .25+.25(1-4β)n+.5(1-2(α +β))n• P(x -> y) = .25+.25(1-4β)n+.5(1-2(α +β))n if x is apurine and y is the other purineEven more advanced models• Get to greater levels of realism• Kimura model still has a uniform stationarydistribution, which is not true of real data• One extension: purine to pyrimidine subst.prob. is different from pyrimidine to purinesubst. prob.– This leads to a non-uniform stationary probabilityFelsenstein models1-u+uϕtuϕcuϕguϕatuϕt1-u+uϕcuϕguϕacuϕtuϕc1-u+uϕguϕaguϕtuϕcuϕg1-u+uϕaatcgaTransition probability proportional to the stationary probability of the target


View Full Document

U of I CS 498 - Evolutionary Models

Documents in this Course
Lecture 5

Lecture 5

13 pages

LECTURE

LECTURE

39 pages

Assurance

Assurance

44 pages

LECTURE

LECTURE

36 pages

Pthreads

Pthreads

29 pages

Load more
Download Evolutionary Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Evolutionary Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Evolutionary Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?