Unformatted text preview:

Phylogenies Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 8 2011 1 39 What is a Phylogeny Part of a statistical model for comparative studies that explains covariance among measurements of traits due to common ancestry Tony Ives A connected acyclic edge weighted semi labeled graph where tip nodes are labeled to represent taxa and edge weights usually represent the expected number of nucleotide substitutions per site Ce cile Ane Truth Ken Sytsma Phylogenies for Comparative Methods What is a phylogeny 2 39 How do we estimate a phylogeny There are a multitude of methods to estimate phylogenies from various sorts of data Presently the most common approaches use multiple alignments of DNA sequence data but this is not always the case especially when some taxa are represented by fossils There are methods for trait data amino acid sequences AFLP markers restriction sites and others When selecting a method to construct a phylogeny it is important to understand the underlying assumptions Phylogenies for Comparative Methods How do we estimate a phylogeny 3 39 Methods of Phylogenetic Reconstruction Primary methods of phylogenetic reconstruction include these Parsimony UPGMA Neighbor joining and variants Maximum likelihood Bayesian approaches Here are important considerations for each with regard to finding a tree for a comparative analysis Phylogenies for Comparative Methods How do we estimate a phylogeny 4 39 Parsimony Parsimony seeks the tree topology that requires the fewest total changes on each edge of the tree Parsimony does not directly estimate edge lengths For a given site there can be multiple equally parsimonious ways to map the minimum number of changes onto a tree If a tree topology is selected by parsimony additional methods are needed to find branch lengths There are conditions especially long branch attraction where the parsimony method is likely to select the incorrect tree topology Evaluation of the parsimony score on a single tree is computationally fast but searching for the single most parsimonious tree when there are many taxa requires heuristic methods that may not find the true optimal tree Phylogenies for Comparative Methods How do we estimate a phylogeny 5 39 UPGMA UPGMA acts directly on a pairwise distance matrices among taxa it is an algorithm for producing a tree from such a distance matrix not a model UPGMA produces rooted ultrametric trees trees where all tips are equidistant from the root Such trees are consistent with a molecular clock hypothesis in which the expected rate of nucleotide substitution is constant across all lineages To use UPGMA one needs to specify how distances between taxa are calculated a common choice is the maximum likelihood distance between the sequences but this also requires a selection of a maximum likelihood model UPGMA and other distance methods are often used when data other than DNA sequences are used Phylogenies for Comparative Methods How do we estimate a phylogeny 6 39 UPGMA cont When the true underlying rates of nucleotide substitution are not equal UPGMA can be biased against finding the correct tree topology In formal likelihood based tests it is exceedingly rare with real sequence data to find examples where the molecular clock hypothesis is not strongly rejected Phylogenies for Comparative Methods How do we estimate a phylogeny 7 39 Neighbor joining Equivalently to UPGMA neighbor joining is an algorithm for producing trees directly from pairwise distances Unlike UPGMA neighbor joining produces an unrooted tree topology with branch lengths For comparative methods purposes a root needs to be selected often by using outgroups but the resulting tree will not be ultrametric Just as with UPGMA neighbor joining is an algorithm that makes trees rapidly from even large pairwise distance matrices but to be a complete method requires a specification of how the distances are calculated Both UPGMA and neighbor joining lose information when reducing aligned DNA sequences to distances and in many settings are less accurate in reconstructing the phylogeny than methods that work with sequence data directly Phylogenies for Comparative Methods How do we estimate a phylogeny 8 39 Maximum Likelihood Maximum likelihood depends on an explicit continuous time Markov chain model for how DNA sequence or other data changes along a tree There are many variants among likelihood models that make fewer or greater restrictions among parameters Similar to parsimony maximum likelihood requires a heuristic search across tree space Calculating the likelihood score for a given tree with branch lengths is about as computationally difficult as finding a parsimony score but the need to optimize all parameter values in addition to branch lengths makes maximum likelihood more computationally intensive especially for larger trees and large data sets The resulting tree has branch lengths The search for the best tree can be restricted to ultrametric trees Variations include ultrametric trees with rates that can change along the tree often in a penalized way Phylogenies for Comparative Methods How do we estimate a phylogeny 9 39 Bayesian Methods The Bayesian paradigm differs from the other methods in that the end result is a probability distribution on tree space not a single best tree This distribution is typically represented by a large random but not independent sample of trees selected by Markov chain Monte Carlo It is common for people to compute a consensus tree from the Bayesian sample as a single representative of the distribution Bayesian methods can use the same likelihood models as used in maximum likelihood and actually often use models richer in parameters than is feasible with maximum likelihood for example by partitioning data into parts each with separate sets of parameters Bayesian methods are computationally intensive for large data sets and trees they are computationally favorable to maximum likelihood plus bootstrapping but not for finding a single maximum likelihood tree Bayesian methods can be restricted to ultrametric trees or not Phylogenies for Comparative Methods How do we estimate a phylogeny 10 39 Bayesian Methods cont To account for phylogenetic uncertainty in a comparative analysis one approach is to use MCMC to select some trees carryout the comparative analysis on each and average the results Phylogenies for Comparative Methods How do we estimate a phylogeny 11 39 A Famous Quote About


View Full Document

UW-Madison BOTANY 940 - Phylogeny

Documents in this Course
Maize

Maize

29 pages

Lecture 2

Lecture 2

23 pages

Load more
Loading Unlocking...
Login

Join to view Phylogeny and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Phylogeny and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?