View Full Document


Unformatted text preview:

Copyright 2006 by the Genetics Society of America DOI 10 1534 genetics 105 053314 A Bayesian Heterogeneous Analysis of Variance Approach to Inferring Recent Selective Sweeps John M Marshall 1 and Robert E Weiss Department of Biomathematics UCLA School of Medicine Los Angeles California 90095 1766 and Department of Biostatistics UCLA School of Public Health Los Angeles California 90095 1772 Manuscript received November 9 2005 Accepted for publication May 31 2006 ABSTRACT The distribution of microsatellite allele sizes in populations aids in understanding the genetic diversity of species and the evolutionary history of recent selective sweeps We propose a heterogeneous Bayesian analysis of variance model for inferring loci involved in recent selective sweeps by analyzing the distribution of allele sizes at multiple loci in multiple populations Our model is shown to be consistent with a multilocus test statistic ln RV proposed for identifying microsatellite loci involved in recent selective sweeps Our methodology differs in that it accepts original allele size data rather than summary statistics and allows the incorporation of prior knowledge about allele frequencies using a hierarchical prior distribution consisting of log normal and gamma probability distributions Interesting features of the model are its ability to simultaneously analyze allele size data for any number of populations and to cope with the presence of any number of selected loci The utility of the method is illustrated by application to two sets of microsatellite allele size data for a group of West African Anopheles gambiae populations The results are consistent with the suppressed recombination model of speciation and additional candidate loci on chromosomes 2 079 and 175 and 3 088 are discovered that escaped former analysis U NDERSTANDING which regions of the genome have been acted on by selection facilitates our understanding of the genetic basis of species specific differences and allows us to identify genomic regions of functional and medical importance Over the last few decades various approaches for identifying genes as targets of selection have been proposed Some of these approaches require prior knowledge of the location and function of candidate genes while other methods such as QTL mapping require prior knowledge of the phenotypic trait of adaptive relevance and its pattern of heredity Lange 1997 Through the availability of completely sequenced genomes and the advent of genomewide scanning it has become unnecessary to have prior knowledge of a genomic region to infer whether or not it has been the target of selection Luikart 2003 A number of tests of neutrality have been proposed that are based purely on allelic distributions and levels of variability Nielsen 2001 These are based on variability at a single locus Ewens 1972 Tajima 1989 allelic variability at multiple loci Lewontin and Krakauer 1973 Hudson et al 1987 Schlo tterer 2001 and comparisons of variability or divergence between different classes of muta1 Corresponding author Department of Biomathematics UCLA School of Medicine Box 951766 Los Angeles CA 90095 1766 E mail johnmm ucla edu Genetics 173 2357 2370 August 2006 tions within a locus McDonald and Kreitman 1991 Goldman and Yang 1994 Tests of neutrality based on a single locus such as Tajima s D Tajima 1989 run into difficulties because it is difficult to distinguish between a reduction of variance in allele size due to selection and a reduction due to a population bottleneck Simonsen et al 1995 Such tests run the risk of becoming tests of the equilibrium neutral population model rather than tests of selective neutrality Tests of neutrality based on multiple loci such as the HKA test Hudson et al 1987 and the ln RV test Schlo tterer 2001 avoid these concerns This is because while neutral loci are similarly affected by demography and evolutionary history the distribution of alleles in selected loci is affected differently from neutral loci and hence displays outlier patterns Hunting for selected loci can be done using a variety of natural genetic markers Two common families of markers used for detecting selective sweeps are microsatellites and SNPs Most research to date has been conducted using microsatellites which while less prolific than SNPs have the benefit of being multiallelic markers and hence are highly informative Schlo tterer and Wiehe 1999 Microsatellites are tandem repeats of short DNA segments that are typically between 1 and 5 bp in length and their alleles are defined by the number of DNA segment repeats that are present at a particular locus 2358 J M Marshall and R E Weiss The number of tandem repeats in a microsatellite allele at a specific locus is highly variable due to a number of factors but primarily due to slippage during DNA replication Slatkin 1995 Slippage rates vary from locus to locus and hence locus specific mutation rates determine the characteristic variance in allele size at a given microsatellite locus in a given population Schlo tterer et al 1997 Another process affecting the number of tandem repeats at a given locus is the hitchhiking of a microsatellite allele to a selected gene Maynard Smith and Haigh 1974 Even though microsatellites are unlikely to be the target of selection themselves a microsatellite locus closely linked to a beneficial mutation will be selected for along with the beneficial mutation decreasing the variance in allele size at the microsatellite locus adjacent to the site of the selected gene Wiehe 1998 Thus looking for loci in populations with less variance in allele size than expected can be used as a method for identifying chromosomal regions that have been the target of selection If all loci in a given population show less allele size variance than expected this implies that a population bottleneck could have occurred One method that has recently been proposed for identifying chromosomal regions that have been acted on by selection is the ln RV statistic Schlo tterer 2001 The ln RV statistic is equal to the natural logarithm of the ratio of observed variances in repeat number at an individual microsatellite locus in two populations Denoting the locus by j and the populations by i1 and i2 the ln RV statistic may be represented mathematically as s2i1 j 1 ln RV i1 i2 j log 2 si 2 j Assuming the stepwise mutation model Ohta and Kimura 1973 neutrality and mutation drift equilibrium then from standard population genetics the variance in

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...

Join to view 61 2006 genetics marshall and access 3M+ class-specific study document.

We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 61 2006 genetics marshall and access 3M+ class-specific study document.


By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?