CHAPTER 14 Molecular Evolution and Population Genetics Comparative study of macromolecules within and among species constitutes the field of molecular evolution The study of molecular markers revealed that natural populations contain abundant genetic variation at the molecular level The application of genetic principles to entire populations of organism constitutes the subject of population genetics 14 1 DNA and protein sequences contain information about the evolutionary relationships among species A gene tree is a diagram of the inferred ancestral history of a group of sequences The accumulation of differences is the basis of molecular phylogenetics which is the analysis of molecular sequences in order to infer their evolutionary relationships Changes in sequence are a matter of chance depending on which mutations take place and their likelihood of being fixed in the population An allele is fixed if it replaces all other alleles in the population Some mutations may increase the ability of the organism to survive and reproduce A gene tree is a pattern of evolutionary relationships among a set of sequences Methods for estimating a gene tree Distance number of differences between each pair Parsimony all possible gene trees to find those that minimize the number of fixed mutations needed to account for the data Maximum likelihood how nucleotide or amino acid substitutions occur and then identify the gene tree that maximizes the probability of observing the actual data based on this model and Bayesian infer the relative probability of any gene tree based on prior assumptions about the distribution of possible trees One way to estimate a gene tree from a distance matrix is known as neighbor joining Bootstrapping is a method of assigning a level of confidence to each node in a gene tree A gene tree does not necessarily coincide with a species tree On the other hand for genes with polymorphisms that persist for relatively short times or for species that are sufficiently old gene trees often do coincide with species trees Rates of evolution can differ dramatically from one protein to another Rate of sequence evolution of a molecule is the fraction of sites that undergo a change in some designated interval of time Different proteins evolve at very different rates A molecular clock is constancy in the rate of amino acid replacement over long periods of evolutionary time provides time scale for the branching of species Selectively neutral genes are those that have no effect on the ability of the organisms to survive and reproduce rate of neutral evolution mu Rates of evolution sites differ according to their function A synonymous substitution does not result in an amino acid replacement A nonsynonymous substitution results in an amino acid replacement The fastest evolving DNA sequences are those of pseudogenes which are duplicate genes that have lost their function because of mutation DNA sequences in which most nucleotide substitutions are deleterious are relatively intolerant to nucleotide substitutions and the rate of nucleotide substitution is relatively low New genes usually evolve through duplication and divergence Genes that are duplicated as an accompaniment to speciation and that retain the same function are known as orthologous genes New gene functions can arise from duplications that take place in the genome of a single species Duplications within a genome result in paralogous genes Specialization of paralogs accompanying loss of functional capabilities is known as subfunctionalization can be advantageous because each specialized gene is free to evolve toward optimal function in its own domain of expression 14 2 Genotypes may differ in frequency from one population to another Allele frequencies are estimated from genotype frequencies AIDS Resistance CCR5 enables HIV to combine with plasma membrane and infect the CD4 class of T cells of the immune system It is common for people to have a 32 bp deletion here An allele with a frequency of 1 0 is fixed and an allele whose frequency has reached 0 is lost The allele frequencies among gametes equal those among reproducing adults Mendelian segregation ensures that each heterozygous genotype will produce equal numbers of each type of gamete 14 3 Random mating means that mates pair without regard to genotype Random mating is by far the most prevalent mating system for most species of animals and plants except for plants that regularly reproduce through self fertilization Random mating of individuals is equivalent to the random union of gametes The Hardy Weinberg principle has important implications for population genetics The allele frequencies remain constant from generation to generation Mating is random there are no subpopulations that differ in allele frequency allele frequencies are the same in males and females all the genotypes are equal in survival and fertility mutation does not occur migration into the population is absent and the population is sufficiently large It is entirely possible for one or more assumptions of the Hardy Weinberg principle to be violated including the assumption of random mating and still not produce deviations from the expected genotype frequencies that are large enough to be detected by the chi square test If an allele is rare it is found mostly in heterozygous genotypes When an allele is rare there are many more heterozygotes than there are homozygotes for the rare allele Hardy Weinberg frequencies can be extended to multiple alleles X linked genes are a special case because males have only one chromosome For an X linked recessive trait the frequency of affected males provides an estimate of the frequency of the recessive allele 14 4 Highly polymorphic sequences are used in DNA typing Many genes in human populations are polymorphic meaning that they have two or more alleles that are common in the population It is virtually impossible for two human beings to be genetically identical The use of polymorphisms in DNA to link suspects with samples of human material is called DNA typing A polymorphism where restriction fragments correspond to each allele differ in length because they contain different numbers of units is called a simple sequence repeat SSR SSRs are abundant in the human genome genetic markers based on SSRs have been the workhorse of mapping human disease genes particularly those based on repeats of 5 CA 3 The SSRs used in DNA typing usually have longer repeat units than those used in
View Full Document