MIT 6 047 - Lecture 10 Molecular Evolution and Phylogenetics - D325812

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 047> Lecture 10 Molecular Evolution and Phylogenetics

MIT 6 047 - Lecture 10 Molecular Evolution and Phylogenetics

School name Massachusetts Institute of Technology

Course 6 047- Computational Biology

Pages 7

Download Save

Unformatted text preview:

MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, EvolutionFall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.1 6.047/6.878 Lecture 10 Molecular Evolution and Phylogenetics October 8, 2008 Notes from Lecture Error in slide 13: Nsubstitions < Nmutations • • Next 2 lectures given by guest speakers The section on Maximum Likelihood Methods is left for recitation • Motivations Evolution by natural selection is responsible for the divergence of species populations through three primary mechanisms: populations being altered over evolutionary time and speciating into separate branches, hybridization of two previously distinct species into one, or termination by extinction. Given the vastness of time elapsed since life ﬁrst emerged on this planet, many distinct species have evolved which are all related to one another; phylogenetics is the study of evolutionary relatedness among species and populations. Traditional phylogeny asks how do species evolve and before the advent of genomic data mostly relied on physiological data (bone structure from fossils, etc). We are interested in tackling phylogenetics from a diﬀerent perspective; analyzing DNA sequence data in order to determine relationships between and among species. At the core, we would like to detect e vidence of natural selection in populations. This is an increasingly important area of research in computational biology and is starting to ﬁnd commercial applications in the realm of personal genomics: it was recently announced that a joint MIT & Harvard aﬃliated company was established to sequence individual genomes for $5000 (other private companies including “23 and me,” “deCODEme,” are already doing this. We will formulate this biological problem in computational terms by studying two probabilistic models of divergence: Jukes-Cantor & Kimura. Two purely algorithmic approaches (UPGMA & Neighbor-Joining) will be introduced to build species or gene trees from these relate dness data (the distinction between the species & gene trees is explained below). Among the many open problems in phylogenetics that we can currently address with genomics are how similar two sp e cies are, what migration paths early humans took when they ﬁrst left the African continent by studying variations in identical genes of a number of local trib e s around the world (The National Genographic Project is one such example), and determining our closest living cousins (chimpanzees or gorillas?), among many others. Many ope n questions in evolutionary biology have already been answered by genomnic phylogenetics (a major recent one being the revelation that the closest living relative of the whale is the hippopotamus). 1Information in phylogenetics is be st represented using trees that succinctly show relationships among species or genes. There are a number of important issues regarding inferring phylogenies when modeling evolution with trees that one should be aware of: • nodes linking branches (exact type of common ancestors) • meaning of branch lengths (time scaled or not) • type of splitting event at branches (usually binary) As an aside relating to the last bulleted point, Professor Pavel Pevzner of UCSD (author of our class textbook on Bioinformatic Algorithms) mentioned in his recent talk at MIT that the order of convergence (are humans closer to dogs or mouse?) may require a trifurcation model (split in three ways) instead. It is important to note that gene & species divergence are two distinct events. The same gene (or slight variations thereof) can be found in diﬀerent sp e cies (organisms that cannot interbreed). To think about it another way, a species tree is a special case of a gene tree, consisting of an orthologous sequence of the same common gene. Moreover, in a species tree, we can have gene ﬂow between the diﬀerent branches of the tree. If every “leaf” is an organism, it is a species tree. A gene tree captures both speciation & duplication events with the length between the root and the various leaves of the tree being a measure of the number of mutations between the two. The level of tree complexity (branching lengths & number) dictates what types of algorithms to use. In this class we focus on sequence comparison to develop gene & species trees. There are many advantages of using genomes in phylogenetics. A major one being the vast amount of information we have access to: consider for a moment that for every position in the genome, particularly individual amino acid positions within the protein sequence, lies a corre-sponding trait. A small number of traits is usually used to assemble species trees, for instance in traditionaly phylogenetics (before the advent of genomic data), one could compare the panda skeletal structure with that of bears & raccoons. The underlying premise in creating trees from traits is the parsimony principle: ﬁnd a tree that best explains the set of traits in a minimum number of changes. Unfortunately, there are also a number of complications having to do with the fact that traits are typically ill behaved: back mutations are frequent (nails becoming short, then long and ﬁnally back to short), inaccessibility to ancestral sequences and diﬃculties in corre-lating substitution rate with time. With the regards to back mutations, even though evolution is traditionally thought of as divergent always acting to increase entropy, there are rare instances of convergent evolution (or homoplacy). Homoplacy is the phenomenon where two separate lineages, completely independent of one another, undergo the same changes that lead to their convergence. It is a strictly random pro ce ss that has not infrequently been observed in nature. To take the human population as an example, the mutation rate within the human genome since our species left Africa is comparable to a phylogenetic event. The mutations are infrequent, roughly 1000 mutations (single nucleotide polymorphisms or SNPs) in the total 3 billion nucleotide genome. This is why producing a map of the human family tree can even be undertaken given the manageability of such complexity. How are genes produced? There are two main mechanisms: 1)

View Full Document


School:
Email:
New Password:
Confirm Password:

MIT 6 047 - Lecture 10 Molecular Evolution and Phylogenetics

Sign up for free to view:

Please select your school