MIT 6 047 - Lecture 10 Molecular Evolution and Phylogenetics

Unformatted text preview:

MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, EvolutionFall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.1 6.047/6.878 Lecture 10 Molecular Evolution and Phylogenetics October 8, 2008 Notes from Lecture Error in slide 13: Nsubstitions < Nmutations • • Next 2 lectures given by guest speakers The section on Maximum Likelihood Methods is left for recitation • Motivations Evolution by natural selection is responsible for the divergence of species populations through three primary mechanisms: populations being altered over evolutionary time and speciating into separate branches, hybridization of two previously distinct species into one, or termination by extinction. Given the vastness of time elapsed since life first emerged on this planet, many distinct species have evolved which are all related to one another; phylogenetics is the study of evolutionary relatedness among species and populations. Traditional phylogeny asks how do species evolve and before the advent of genomic data mostly relied on physiological data (bone structure from fossils, etc). We are interested in tackling phylogenetics from a different perspective; analyzing DNA sequence data in order to determine relationships between and among species. At the core, we would like to detect e vidence of natural selection in populations. This is an increasingly important area of research in computational biology and is starting to find commercial applications in the realm of personal genomics: it was recently announced that a joint MIT & Harvard affiliated company was established to sequence individual genomes for $5000 (other private companies including “23 and me,” “deCODEme,” are already doing this. We will formulate this biological problem in computational terms by studying two probabilistic models of divergence: Jukes-Cantor & Kimura. Two purely algorithmic approaches (UPGMA & Neighbor-Joining) will be introduced to build species or gene trees from these relate dness data (the distinction between the species & gene trees is explained below). Among the many open problems in phylogenetics that we can currently address with genomics are how similar two sp e cies are, what migration paths early humans took when they first left the African continent by studying variations in identical genes of a number of local trib e s around the world (The National Genographic Project is one such example), and determining our closest living cousins (chimpanzees or gorillas?), among many others. Many ope n questions in evolutionary biology have already been answered by genomnic phylogenetics (a major recent one being the revelation that the closest living relative of the whale is the hippopotamus). 1Information in phylogenetics is be st represented using trees that succinctly show relationships among species or genes. There are a number of important issues regarding inferring phylogenies when modeling evolution with trees that one should be aware of: • nodes linking branches (exact type of common ancestors) • meaning of branch lengths (time scaled or not) • type of splitting event at branches (usually binary) As an aside relating to the last bulleted point, Professor Pavel Pevzner of UCSD (author of our class textbook on Bioinformatic Algorithms) mentioned in his recent talk at MIT that the order of convergence (are humans closer to dogs or mouse?) may require a trifurcation model (split in three ways) instead. It is important to note that gene & species divergence are two distinct events. The same gene (or slight variations thereof) can be found in different sp e cies (organisms that cannot interbreed). To think about it another way, a species tree is a special case of a gene tree, consisting of an orthologous sequence of the same common gene. Moreover, in a species tree, we can have gene flow between the different branches of the tree. If every “leaf” is an organism, it is a species tree. A gene tree captures both speciation & duplication events with the length between the root and the various leaves of the tree being a measure of the number of mutations between the two. The level of tree complexity (branching lengths & number) dictates what types of algorithms to use. In this class we focus on sequence comparison to develop gene & species trees. There are many advantages of using genomes in phylogenetics. A major one being the vast amount of information we have access to: consider for a moment that for every position in the genome, particularly individual amino acid positions within the protein sequence, lies a corre-sponding trait. A small number of traits is usually used to assemble species trees, for instance in traditionaly phylogenetics (before the advent of genomic data), one could compare the panda skeletal structure with that of bears & raccoons. The underlying premise in creating trees from traits is the parsimony principle: find a tree that best explains the set of traits in a minimum number of changes. Unfortunately, there are also a number of complications having to do with the fact that traits are typically ill behaved: back mutations are frequent (nails becoming short, then long and finally back to short), inaccessibility to ancestral sequences and difficulties in corre-lating substitution rate with time. With the regards to back mutations, even though evolution is traditionally thought of as divergent always acting to increase entropy, there are rare instances of convergent evolution (or homoplacy). Homoplacy is the phenomenon where two separate lineages, completely independent of one another, undergo the same changes that lead to their convergence. It is a strictly random pro ce ss that has not infrequently been observed in nature. To take the human population as an example, the mutation rate within the human genome since our species left Africa is comparable to a phylogenetic event. The mutations are infrequent, roughly 1000 mutations (single nucleotide polymorphisms or SNPs) in the total 3 billion nucleotide genome. This is why producing a map of the human family tree can even be undertaken given the manageability of such complexity. How are genes produced? There are two main mechanisms: 1)


View Full Document
Download Lecture 10 Molecular Evolution and Phylogenetics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 10 Molecular Evolution and Phylogenetics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 10 Molecular Evolution and Phylogenetics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?