Unformatted text preview:

PhylogeneticsCOS551, Fall 2003Mona SinghPhylogenetics• Phylogenetic trees illustrate the evolutionary relationships among groups of organisms, or among a family of related nucleic acid or protein sequences• E.g., how might have this family been derived during evolutionHypothetical Tree Relating OrganismsPhylogenetic Relationships Among Organisms• Entrez: www.ncbi.nlm.nih.gov/Taxonomy• Ribosomal database project: rdp.cme.msu.edu/html/• Tree of Life: phylogeny.arizona.edu/tree/phylogeny.htmlGlobin SequencesPhylogeny Applications• Tree of life: Analyzing changes that have occurred in evolution of different organisms• Phylogenetic relationships among genes can help predict which ones might have similar functions (e.g., ortholog detection)• Follow changes occuring in rapidly changing species (e.g., HIV virus)Phylogeny Packages• PHYLIP, Phylogenetic inference package – evolution.genetics.washington.edu/phylip.html– Felsenstein– Free!• PAUP, phylogenetic analysis using parsimony– paup.csit.fsu.edu– SwoffordWhat data is used to build trees?• Traditionally: morphological features (e.g., number of legs, beak shape, etc.)• Today: Mostly molecular data (e.g., DNA and protein sequences)Data for Phylogeny• Can be classified into two categories:– Numerical data • Distance between objectse.g., distance(man, mouse)=500,distance(man, chimp)=100Usually derived from sequence data– Discrete characters• Each character has finite number of statese.g., number of legs = 1, 2, 4DNA = {A, C, T, G}Rooted vs Unrooted TreesRooted treeUnrooted treeInternal nodeExternal nodeRootNote: Here, each node has three neighboring nodesTerminology• External nodes: things under comparison; operational taxonomic units (OTUs)• Internal nodes: ancestral units; hypothetical; goal is to group current day units• Root: common ancestor of all OTUs under study. Path from root to node defines evolutionary path• Unrooted: specify relationship but not evolutionary path– If have an outgroup (external reason to believe certain OTU branched off first), then can root• Topology: branching pattern of a tree• Branch length: amount of difference that occurred along a branchHow to reconstruct trees• Distance methods: evolutionary distances are computed for all OTUs and build tree where distance between OTUs “matches” these distances• Maximum parsimony (MP): choose tree that minimizes number of changes required to explain data• Maximum likelihood (ML): under a model of sequence evolution, find the tree which gives the highest likelihood of the observed dataNumber of possible treesGiven n OTUs, there are unrooted trees2,027,025101553413unrooted treesOTUsNumber of possible treesGiven n OTUs, there are rooted trees34,459,42510105515433Rooted treesOTUsBottom Line: an enumeration strategyover all possible trees to find the best one under some criteria is not feasible!ParsimonyFind tree which minimizes number of changes needed to explain dataEx: 123456A GTCGTAB GTCACTC GCGGTA D ACGACAE ACGGAAParsimony • For given example tree and alignment, can do this for all sites, and get away with as few as 9 changes• Changing the tree (either the topology or labeling of leaves) changes the minimum number of changes need• Two computational problems– (Easy) Given a particular tree, how do you find minimum number of changes need to explain data? (Fitch)– (Hard) How do you search through all trees?Parsimony: Fitch’s algorithmIdea: construct set of possible nucleotides for internal nodes,based on possible assignments of childrenParsimony: Fitch’s algorithm• For each site:– Each leaf is labeled with set containing observed nucleotide at that position– For each internal node i with children j and k with labels Sjand Sk• Total # changes necessary for a site is # of union operationsParsimony• How do you search through all trees? – Enumerate all trees (too many…)– Can use techniques to try to limit the search space (e.g., branch and bound)– or use heuristics (many possibilities)• E.g., nearest neighbor interchange. Start with a tree and consider neighboring trees. If any neighboring tree has fewer changes, take it as current tree. Stop when no improvementsacbdadbcabcdParsimony weaknessesParsimony analysis implicitly assumes that rate of changealong branches are similarGGAAReal tree: two long brancheswhere G has turned to A independentlyGGAAInferred treeDistance Methods• Input: given an n x n matrix M where Mij>=0 and Mijis the distance between objects i and j• Goal: Build an edge-weighted tree where each leaf (external node) corresponds to one object of M and so that distances measured on the tree between leaves i and jcorrespond to MijDistance Methods0371315E061214D01214C012B0AEDCBAA tree exactly fitting the matrix does not always exist.Distance Method Criteria• Try to find the tree with distances dijwhich “best fits” the distance data Mij• Different possibilities for “best”– Cavalli-Sforza criterion: minimize – Fitch-Margoliash criterion: minimize• Unfortunately, both lead to computationally intractable problems (e.g., enumerating)Distance Method Heuristic: UPGMA• UPGMA (Unweighted group method with arithmetic mean)– Sequential clustering algorithm– Start with things most similar• Build a composite OTU– Distances to this OTU are computed as arithmetic means– From new group of OTUs, pick pair with highest similarity etc.• Average-linkage clusteringUPGMA: Visually123541 2 3 5 4UPGMA Example0111412D097C08B0ADCBAM B(AC)= (MBA+ MBC)/2 = (8+9)/2=8.5M D(AC)= (MDA+ MDC)/2= (12+11)/2=11.5UPGMA Example01411.5D08.5B0ACDBACM (ABC)D= (MAD+ MBD + MCD)/3 = (12+14+11)/3UPGMA: Example012.33D0ABCDABCUPGMA weaknesses0111412D097C08B0ADCBAIn fact, exact fitting tree exists !UPGMA weaknesses• UPGMA assumes that the rates of evolution are the same among different lineages• In general, should not use this method for phylogenetic tree reconstruction (unless believe assumption)• Produces a rooted tree• As a general clustering method (as we discussed in an earlier lecture), it is better…Distance Method: Neighbor Joining• Most widely-used distance based method for phylogenetic reconstruction• UPGMA illustrated that it is not enough to just pick closest neighbors• Idea here: take into account averaged distances to other leaves as well• Produces an unrooted treeNeighbor Joining (NJ)Start off with star tree; pull out


View Full Document

Princeton COS 551 - Phylogenetics

Documents in this Course
Load more
Download Phylogenetics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Phylogenetics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Phylogenetics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?