DOC PREVIEW
The evolutionary demography of duplicate genes

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The evolutionary demography of duplicate genesMichael Lynch1*& John S. Conery21Dept. of Biology, Indiana University, Bloomington, Indiana 47405;2Dept. of Computer and InformationScience, University of Oregon Eugene, Oregon 97403Received 21.05.2002; accepted in final form 29.08.2002Key words: gene duplication, genome evolution, genome sizeAbstractAlthough gene duplication has generally been viewed as a necessary source of material for the origin of evolu-tionary novelties, the rates of origin, loss, and preservation of gene duplicates are not well understood. Applyingsteady-state demographic techniques to the age distributions of duplicate genes censused in seven completelysequenced genomes, we estimate the average rate of duplication of a eukaryotic gene to be on the order of 0.01/gene/million years, which is of the same order of magnitude as the mutation rate per nucleotide site. However,the average half-life of duplicate genes is relatively small, on the order of 4.0 million years. Significant inter-specific variation in these rates appears to be responsible for differences in species-specific genome sizes thatarise as a consequence of a quasi-equilibrium birth-death process. Most duplicated genes experience a brief periodof relaxed selection early in their history and a minority exhibit the signature of directional selection, but thosethat survive more than a few million years eventually experience strong purifying selection. Thus, although mosttheoretical work on the gene-duplication process has focused on issues related to adaptive evolution, the originof a new function appears to be a very rare fate for a duplicate gene. A more significant role of the duplicationprocess may be the generation of microchromosomal rearrangements through reciprocal silencing of alternativecopies, which can lead to the passive origin of post-zygotic reproductive barriers in descendant lineages ofincipient species.For practical reasons, much of the past focus ongenome evolution has been on divergence at thenucleotide level in specific genes. But with the grow-ing proliferation of whole-genome sequences, a moreglobal view of genomic evolution is beginning toemerge. Just as nucleotide changes continuously arisewithin populations via mutation, accidents at the levelof chromosomal regions regularly give rise to lossesand duplications of entire genes. Such genomic turn-over is ultimately responsible for interspecific diver-gence in gene content, which may be exploited foradaptive reasons, and for modifications of gene loca-tion, which may passively give rise to post-zygoticreproductive isolating barriers (for review, see Lynch,2002). Thus, it is of some interest to determine therate at which new genes arise via duplication eventsand the frequency and mechanisms by which they arepreserved.Because of the difficulties with quantifying lowprobability events at the molecular level, we arealmost completely lacking in direct estimates of therate of gene duplication, although rates as high as10−6to 10−4per gene per generation have beenreported for Drosophila (Shapira and Finnerty 1986).We recently obtained indirect estimates of the ratesof birth and loss of new genes through censuses ofthe contents of the then largely sequenced nucleargenomes of several eukaryotes (Lynch and Conery2000), and additional estimates using somewhat dif-ferent criteria have been published by Gu et al.(2002). Since these studies were performed, nearlycomplete genomic sequences have emerged for sev-eral species and all of the pre-existing databases havebeen refined considerably. We, therefore, take thisopportunity to update and expand our previousresults.35Journal of Structural and Functional Genomics 3: 35–44, 2003.© 2003 Kluwer Academic Publishers. Printed in the Netherlands.XPS 5103052 (JSFG) – product element JSFGSI-02-04 – GrafikonSources of data and methods of analysisFor each of the fully sequenced eukaryotic genomes,we downloaded all coding sequences and their corre-sponding amino-acid sequences from the most re-cently curated database (as of 1 April 2001), remov-ing all suspected pseudogenes, transposable elements,and overlapping genes prior to subsequent analyses:Schizosaccharomyces pombe – The Sanger Centre(ftp://ftp.sanger.ac.uk/pub/yeast/sequences/pombe);Saccharomyces cerevisiae – National Center for Bio-technology Informationftp://ftp.ncbi.nih.gov/genbank/genomes/S_cerevisiae;Arabidopsis thaliana – The Institute for Genomic Re-search(ftp://ftp.tigr.org/pub/data/athaliana/ath1);Caenorhabditis elegans – WormBase(http://www.wormbase.org);Drosophila melanogaster – Berkeley DrosophilaGenome Project(http://www.fruitfly.org/sequence/download.html);and Homo sapiens – The Ensembl Project(ftp://ftp.ensembl.org/current/data/).To identify duplicate genes, we used BLAST(Altschul et al. 1997) to compare all pairs of amino-acid sequences within each genome, retaining onlythose pairs for which the alignment score was below10−10. To minimize the inclusion of members of largemultigene families, we excluded all genes that iden-tified more than five matching sequences. Using eachprotein alignment generated by BLAST as a guide,we aligned the nucleotide sequences, and then priorto sequence analysis, we used a gap-expansion algo-rithm to remove ambiguous portions of the align-ments (Conery and Lynch 2001).The numbers of nucleotide substitutions per silentand replacement sites (S and R, respectively) werethen estimated for each pair by using the maximum-likelihood procedure in the PAML software package(version 2.0k) (Yang 1997). Estimated rates of nucle-otide substitution are sensitive to the relative rates ofoccurrence of transitions and transversions, whichcannot be estimated accurately when the amount ofsequence divergence is high. Therefore, to obtain pre-cise estimates of the transition/transversion biasamong newly arisen mutations, prior to the analysesof sequence divergences for each species, we talliedthe observed substitutions at all four-fold redundantsites in all pairs of duplicate sequences that were sim-ilar enough that multiple substitutions per site wereunlikely (by confining these computations to loci forwhich the divergence at such sites was ⱕ 15%, afterverifying that the transition/transversion ratio isessentially constant below this point). Each species-specific estimate of the transition/transversion ratiowas then treated as a constant in the maximum-like-lihood analyses.In genome-wide surveys,


The evolutionary demography of duplicate genes

Download The evolutionary demography of duplicate genes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The evolutionary demography of duplicate genes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The evolutionary demography of duplicate genes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?