Unformatted text preview:

Statistical Alignment: Computational Properties, Homology Testing and Goodness-of-FitMain Objective of the paperWhy isn’t statistical alignment popular?parsimony and similarity alignmentsTKF91 model of substitutionsTKF91 model of the indel processTKF91 modelSlide 8Calculating the probability of two sequencesStatistical Alignment: Statistical Alignment: Computational Properties, Computational Properties, Homology Testing and Homology Testing and Goodness-of-FitGoodness-of-FitJ. Hein, C. Wiuf, B. Knudsen, M.B. Moller and G. WiblingMain Objective of the paperMain Objective of the paperTo show how to accelerate the statistical alignment algorithms several orders of magnitude using the model of insertion and deletions by Thorne, Kishino, and Felsenstein in 1991 (TKF91 model).To propose a new homology test based on the model.To describe a goodness-of-fit test that allows testing the proposed insertion-deletion process inherent to the model.Why isn’t statistical alignment popular?Why isn’t statistical alignment popular?Computationally VERY SLOW–Authors of the paper accelerated the statistical alignment algorithms several orders of magnitude compared with the TKF91 algorithm.Lack of user-friendly software?–Usually written in Fortran or C, or the compiled program only works in UNIX environment, but most biologists don’t know much about it.–Authors of the paper have provided a web interface to the programparsimony and similarity alignments parsimony and similarity alignments • Parsimony strategy: minimizing the distance For example:• Similarity strategy: maximizing the similarity score For example: BLASTTKF91 model of substitutionsTKF91 model of substitutionscontinuous time Markov model on the state space of nucleotides or amino acidsRate matrix Q is specified–Describes the intensity of different substitution events over an infinitesimal time period.–Probability that i has changed to j after time t is ijtQeijP )(The process is assumed to be time reversible:)()( tjiPjtijPiTKF91 model of the indel processTKF91 model of the indel processCan be view as a Markov model with all sequences as possible statesindel part of the model–links connecting the letters of the sequences–each has a mortal link on the right–left end has an immortal link–For example: - A  G  G •If the type of the nucleotide is ignored, can be represented as -   TKF91 modelTKF91 model–mortal link can give birth to a new mortal link or die out–immortal link can also give birth but would not die–Therefore, the rates can be written as:- A  G  G I0 S1 I1/D1 S2 I2/D2 S3 I3/D3where I is the birth rate D is the death rate, D>I S is the substitution rateTKF91 modelTKF91 modelTo calculate the probability of a particular alignment:s(1): - A  T  -s(2): - C  T  G P(s(1), s(2), alignment) = (p1’’)(AP1 PAC)(T P2  PTT G)Calculating the probability of two Calculating the probability of two sequencessequencesWithout conditioning on the alignment, it is necessary to sum over all alignments weighted with their probabilities according to the TKF91 process.Confine likelihood calculations to a band close to the similarity based alignment allows an efficient numerical optimization algorithm for finding the maximum likelihood estimateThe recursions originally presented by Thorne, Kishino and Felsenstein can be


View Full Document

CORNELL CS 726 - Statistical Alignment

Download Statistical Alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Statistical Alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Statistical Alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?