Unformatted text preview:

IntroductionPrevious Work on ReconciliationA Taxonomy of ModelsFormal Macroevolutionary Reconstruction ProblemsReconstruction AlgorithmsMacrophylogenyHybrid Micro-MacrophylogenyExperimental ResultsA Hybrid Micro-Macroevolutionary Approach toGene Tree ReconstructionDannie Durand1, Bjarni V. Halld´orsson2, and Benjamin Vernot31Departments of Biological Sciences and Computer Science,Carnegie Mellon [email protected] of Mathematical Sciences, Carnegie Mellon University,Current address: deCode Genetics, Sturlugata 8, 101 Reykjavik, [email protected] of Biological Sciences, Carnegie Mellon [email protected]. Gene family evolution is determined by microevolutionaryprocesses (e.g., point mutations) and macroevolutionary processes (e.g.,gene duplication and loss), yet macroevolutionary considerations arerarely incorporated into gene phylogeny reconstruction methods. Wepresent a dynamic program to find the most parsimonious gene fam-ily tree with respect to a macroevolutionary optimization criterion, theweighted sum of the number of gene duplications and losses. The ex-istence of a polynomial time algorithm for duplication/loss phylogenyreconstruction stands in contrast to most formulations of phylogeny re-construction, which are NP-complete.We next extend this result to obtain a two-phase method for gene treereconstruction that takes both micro- and macroevolution into account.In the first phase, a gene tree is constructed from sequence data, usingany of the previously known algorithms for gene phylogeny construction.In the second phase, the tree is refined by rearranging regions of the treethat do not have strong support in the sequence data to minimize the du-plication/lost cost. Components of the tree with strong support are leftintact. This hybrid approach incorporates both micro- and macroevolu-tionary considerations, yet its computational requirements are modest inpractice because the two phase approach constrains the search space.We have implemented these algorithms in a software tool, Notung2.0, that can be used as a unified framework for gene tree reconstructionor as an exploratory analysis tool that can be applied post hoc to anyrooted tree with bootstrap values. Notung 2.0 also has a new graphicaluser interface and can be used to visualize alternate duplication/loss his-tories, root trees according to duplication and loss parsimony, manipulateand annotate gene trees and estimate gene duplication times.1 IntroductionThe evolutionary history of a gene family is determined by a combination of mi-croevolutionary events (sequence evolution) and macroevolutionary events. TheS. Miyano et al. (Eds.): RECOMB 2005, LNBI 3500, pp. 250–264, 2005.c Springer-Verlag Berlin Heidelberg 2005A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction 251macroevolutionary events considered here are speciation, gene duplication andgene loss. Gene tree reconstruction should be based on a model that incorporatesboth micro- and macroevolutionary events [12], yet few phylogeny reconstructiontools based on such a unified model are available. Furthermore, reconciliation ofa gene tree and a species tree can be used to investigate a variety of questionsrelated to macroevolution, such as inferring gene duplications and losses, esti-mating upper and lower bounds on times these events occurred and determiningwhether a given pair of homologs is orthologous or paralogous. Algorithmic andsoftware support for both these tasks is needed.In the current work, we present a dynamic programming algorithm to findall most parsimonious phylogenies with respect to a macroevolutionary modelof gene duplication and loss. Given the number of gene family members foundin each species as input, our algorithm will construct a tree with the fewestduplications and losses required to explain the data. We show that this problemcan be solved in polynomial time, in contrast to most phylogeny reconstructionproblems, which are NP-complete [4, 6, 7].Using this result, we develop a two-phase approach to gene tree reconstruc-tion that incorporates sequence evolution, gene duplication and gene loss in theevaluation of alternate phylogenies. In phase one, a tree is constructed based ona microevolutionary model only. In phase two, regions of the tree that are notstrongly supported by the sequence data are refined with respect to a macroevo-lutionary parsimony model, while regions with strong support are left intact. Byreserving consideration of macroevolutionary events until phase two and focus-ing only on those areas where the sequence data cannot resolve the topology, thishybrid approach reduces the search space, leading to a method that incorporatesboth types of events, yet has modest computational requirements.We have implemented these algorithms in a software tool called Notung 2.0,which can be used as a unified framework for gene tree reconstruction or as anexploratory analysis tool that can be applied post hoc to any rooted tree withbootstrap values. This extension of an earlier program [3] has the following newfeatures:– a polynomial time algorithm for finding all most parsimonious trees withrespect to a weighted gene duplication and loss cost.– a new graphical interface for tree manipulation and visualization, with fea-tures especially designed for analysis of gene duplications, including identi-fication of duplicated nodes, lost leaves and/or subtrees, annotation of sub-families and color highlighting of rootable edges.In addition, Notung 2.0 retains all features of the original program includinghigh throughput identification of gene duplications with time estimates and root-ing unrooted trees based on duplication/loss score. Notung 2.0, described onthe supplementary web site (http://www.cs.cmu.edu/ durand/Lab/Notung/), isavailable for free, public distribution.In Section 2, we discuss previous work and review reconciliation of a genetree with a species tree. A taxonomy of macro- and microevolutionary models ispresented in Section 3. We state our gene tree reconstruction problems formally252 D. Durand, B.V. Halld´orsson, and B. Vernotin Section 4 and present algorithms to solve these problems in Section 5. InSection 6, we summarize the capabilities of our software tool, Notung 2.0,anddiscuss the application of Notung 2.0 to two large data sets to investigate therole of gene duplication in genetic adaptation to environmental change.2


View Full Document

Stanford CS 374 - Lecture Notes

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?