DOC PREVIEW
U of I CS 498 - Evolutionary Models

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Evolutionary ModelsModels of nucleotide substitutionDiversity in a populationSubstitutionMarkov Chain to model substitutionSlide 6Jukes Cantor ModelSlide 8Slide 9Understanding the J-C ModelUnderstanding the J-C modelSlide 12Spectral DecompositionSpectral decompositionSlide 15Back to the J-C modelSlide 17Substitution probabilityMore advanced modelsKimura modelSlide 21Even more advanced modelsFelsenstein modelsReversible modelsReversible Markov ChainEvolutionary ModelsCS 498 SSSaurabh SinhaModels of nucleotide substitution•The DNA that we study in bioinformatics is the end(??)-product of evolution•Evolution is a very complicated process•Very simplified models of this process can be studied within a probabilistic framework•Allows testing of various hypotheses about the evolutionary process, from multi-species dataSource: Ewens and Grant, Chapter 14.Diversity in a population•There IS genetic variation between individuals in a population•But relatively little variation at nucl. level•E.g., two humans differ at the nucl. level at one in 500 to 1000 nucls.•Roughly speaking, a single nucleotide dominates the population at a particular position in the genomeSubstitution•Over long time periods, the nucleotide at a given position remains the same•But periodically, this nucleotide changes (over the entire population)•This is called “substitution”, i.e., replacement of the predominant nucl. for that position with another predominant nucl.Markov Chain to model substitution•Markov chain to describe the substitution process at a position•States are “a”, “c”, “g”, “t”•The chain “runs” in certain units of time, i.e., the state may change from one time point to the next time point•The unit of time (difference between successive time points) may be arbitrary, e.g., 20000 generations.Markov Chain to model substitution•A symbol such as “pag” is the probability of a change from “a” to “g” in one unit of time•When studying two extant species, the evolutionary model has to provide the joint probability of the two species’ data•Sometimes, this is done by computing probability of the ancestor, starting from one extant species, and then the probability of the other extant species, starting from the ancestor•If we want to do this, the evolutionary process (model) must be “time reversible”: P(x)P(x->y) = P(y)P(y->x)Jukes Cantor Model•Markov chain with four states: a,c,g,t•Transition matrix P given by:a g c ta1-3   g 1-3  c  1-3 t   1-3Jukes Cantor Model is a parameter depending on what a “time unit” means. If time unit represents more #generations,  will be larger must be less than 1/3 thoughJukes Cantor Model•Whatever the current nucl is, each of the other three nucls are equally likely to substitute for itUnderstanding the J-C Model•Consider a transition matrix P, and a probability vector v (a row vector)•What does w = v P represent ?•If v is the probability distribution of the 4 nucls (at a position) now, w is the prob. distr. at the next time step.Understanding the J-C model•Suppose we can find a vector  such that  P = •If the probability distribution is , it will continue to remain  at future times•This is called the stationary distribution of the Markov ChainUnderstanding the J-C model•Check that  = (0.25, 0.25, 0.25, 0.25) satisfies  P = •Therefore, if a position evolves as per this model, for long enough, it will be equally likely to have any of the 4 nucls!•This is the very long term prediction, but can we write down what the position will be as a function of time (steps) ?Spectral Decomposition•Recall that we found a  such that  P = •Such a vector is called an “eigenvector” of P, and the corresponding “eigenvalue” is 1.•In general, if v P =  v (for scalar ),  is called an eigenvalue, and v is a left eigenvector of PSpectral decomposition•Similarly, if P uT =  uT, then u is called a right eigenvector•In general, there may be multiple eigenvalues j and their corresponding left and right eigenvectors vj and uj•We can write P as € P = λjujTvjj∑Spectral decomposition•Then, for any positive integer, it is true that•Why is Pn interesting to us ?•Because it tells us what the probability distribution will be after n time steps •If we started with v, then Pnv will be the prob. distr. after n steps € Pn= λjnujTvjj∑Back to the J-C model•We reasoned that  = (.25,.25,.25,.25) is a left eigenvector for the eigenvalue 1. •Actually, the J-C transition matrix has this eigenvalue and the eigenvalue (1-4), and if we do the math we get the spectral decomposition of P as:€ Pn=.25 .25 .25 .25.25 .25 .25 .25.25 .25 .25 .25.25 .25 .25 .25 ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥+ (1− 4α)n.75 −.25 −.25 −.25−.25 .75 −.25 −.25−.25 −.25 .75 −.25−.25 −.25 −.25 .75 ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥Back to the J-C model•So, if we started with (1,0,0,0), i.e., an “a”, the probability that we’ll see an “a” at that position after n time steps is:0.25+0.75(1-4)n•And the probability that the “a” would have mutated to say “c” is:0.25 - 0.25(1-4)nSubstitution probability•As a function of time n, we therefore get•Pr(x -> y) = 0.25 + 0.75 (1-4)n if x = y•and = 0.25 - 0.25 (1-4)n otherwise•If n ->, we get back our (0.25, 0.25, 0.25, 0.25) calculationMore advanced models•The J-C model made highly “symmetric” assumptions, in its formulation of the transition matrix P•In reality, for example, “transitions” are more common than “transversions”–What are these? Purine = A or G. Pyrimidine = C or T. Transition is substitution in the same category; transversion is substitution across categories–Purines are similarly sized, and pyrimidines are similarly sized. More likely to be replaced by similar sized nucl.•The “Kimura” model captures this transition/transversion biasKimura modela g c ta1--2   g 1--2  c  1--2 t   1--2• This of course is the transition probability matrix P of the Markov chain• Two parameters now, instead of one.Kimura model•Again, one of the eigenvalues is 1, and the left eigenvector corresponding to it is  = (.25,.25,.25,.25)•So again, the stationary distribution is


View Full Document

U of I CS 498 - Evolutionary Models

Documents in this Course
Lecture 5

Lecture 5

13 pages

LECTURE

LECTURE

39 pages

Assurance

Assurance

44 pages

LECTURE

LECTURE

36 pages

Pthreads

Pthreads

29 pages

Load more
Download Evolutionary Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Evolutionary Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Evolutionary Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?