DOC PREVIEW
Using the variance of pairwise differences to estimate the recombination rate

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Genet. Res., Camb.(1997), 69, pp.45–48 With 1 text-figure Copyright # 1997 Cambridge University Press45Using the variance of pairwise differences to estimate therecombination rateJOHN WAKELEY*Department of Biological Sciences, Rutgers Uniersity, New Jersey, USA(Receied 12 August 1996 and in reised form 14 October 1996)SummaryA new estimator is proposed for the parameter C ¯ 4Nc, where N is the population size and c isthe recombination rate in a finite population model without selection. The estimator is animproved version of Hudson’s (1987) estimator, which takes advantage of some recent theoreticaldevelopments. The improvement is slight, but the smaller bias and standard error of the newestimator support its use. The variance of the average number of pairwise differences is alsoderived, and is important in the formulation of the new estimator.1. IntroductionUnder the neutral theory of molecular evolution, theaverage number of pairwise nucleotide differencesamong sequences in a random sample is independentof the recombination rate. If only non-identical pairsare considered, the expectation of this average is equalto θ ¯ 4Nu, where N is the effective population sizeand u is the neutral mutation rate in an infinite-sitesmodel. The variance of pairwise differences, however,does depend on the recombination parameter, C ¯4Nc, where c is the recombination rate. Nearly adecade ago, Hudson (1987) made use of this fact andintroduced an estimator of C based on the sampledistribution of pairwise differences. Since that time,Hudson’s estimator has become the most frequentlyused of the available estimators. Some recent, relatedtheoretical results now suggest improvements toHudson’s original work.Here, a new estimator of C is proposed whichdiffers from Hudson’s in that only non-identical pairsof sequences are considered and because an unbiasedestimator of θ#is employed in its calculation. Thestatistical properties of the new estimator are investi-gated using computer simulations, and are comparedwith those of Hudson’s estimator. The new estimatoris less biased and has a smaller standard error.* Correspondence to : J. Wakeley, Nelson Biological Labs, PO Box1059, Busch Campus, Piscataway, NJ 08855-1059, USA. Fax :1-908-445-5870. e-mail: jwakeley!rci.rutgers.edu.2. The estimatorsFrom a sample of n sequences we can calculate twodifferent averages of the numbers of pairwise dif-ferences, which differ according to how many pairwisecomparisons are considered. If kijis the number ofdifferences between two sequences, i and j, these areπ ¯2n(n®1)3n−"i="3nj=i+"kij(1)andka¯1n#3ni="3nj="kij. (2)Thus, π is computed using only non-identical pairs,whereas kacounts each of these twice and includes then zero values obtained when each sequence iscompared with itself. In a population of constant sizewith neutral, infinite sites mutation, the expectation ofπ is θ (Watterson, 1975; Tajima, 1983). Since (2) canbe rewritten as π(n®1)}n, the expectation of kais equalto θ(n®1)}n.Corresponding to (1) and (2), two variances canalso be calculated:S#π¯2n(n®1)3n−"i="3nj=i+"(kij®π)#, (3)S#k¯1n#3ni="3nj="(kij®ka)#. (4)When there is no recombination, the expectation of S#πis given byE(S#π) ¯92(n®2)3(n®1):θ9(7n3) (n®2)9n(n®1):θ#(5)J. Wakeley 46(Wakeley, 1996). Since (4) can be rewritten asS#k¯0n®1n1S#π0n®1n#1π#, (6)the expectation of S#kbecomesE(S#k)¯9(n®1)(2n®1)3n#:θ9(n®1)(7n#7n®6)9n$:θ#.(7)Equation (7) follows from the substitution into (6) ofexpression (5) and the expression for E (π#) employedby Tajima (1993) to develop an unbiased estimator ofthe variance of π when there is no recombination.Hudson (1987) derived the expectation of S#kwhenthere is recombination. His expression can be writtenE(S#k)¯9(n®1)(2n®1)3n#:θgk(C,n)θ#. (8)The expression for gk(C, n) is reproduced in theAppendix in a different format from that of Hudson(1987). The limit of gk(C, n)asCapproaches zero is,then, equal to the term multiplying θ#in (7). Hudson(1987) proposed the estimator, here called Cqk, thatsolvesS#k¯3hj®3h#jgk(C, n)0nn®13hj1#, (9)where hjis the heterozygosity at site j in a sample ofDNA sequences. Thus, Hudson’s estimator involvesusing 3 hj®3 h#jto estimate the first term on theright-hand side of (7) and [3 hjn}(n®1)]#to estimateθ#, then solving for the value of C that equates theexpectation of S#kmost closely to its observed value.The variance of π with recombination can also beobtained. From (6),E(S#k)¯0n®1n1E(S#π)0n®1n#1E(π#), (10)and since E (S#π) ¯ Var (kij)®Var (π) (Wakeley, 1996),Var (π) ¯0nn®11Var (kij)®0nn®11#E(S#k)01n®11E(π)#.(11)Hudson (1983) derived an expression for Var (kij),given explicitly by Hudson (1990); Hudson (1987)developed E (S#k), reproduced here as (8) ; andWatterson (1975) gave the familiar result thatE(π)¯θ. Then, Var (π) with recombination becomesVar (π) ¯9n13(n®1):θf(C, n) θ#,(12)where f(C, n) is given in the Appendix. As C decreasesto zero, (12) approaches Tajima’s (1983) result.Expression (12) was also recently derived byPluzhnikov & Donelly (1996), but for other purposes.It follows, after some simplification, thatE(S#π)¯92(n®2)3(n®1):θgπ(C, n) θ#,(13)where gπ(C, n) is given in the Appendix, is the ex-pectation of (3) when there is recombination. Ac-cordingly, as C approaches zero, (13) approaches (5).Tajima (1993) noted that π#is a biased estimator ofθ#. Of course, this is true also of Hudson’s (1987)estimator, [3 hjn}(n®1)]#, since expression (1)isidentical to 3 hjn}(n®1). Expression (12) can be usedto obtain an unbiased estimator of θ#:θn#¯p#®[(n1)}3(n®1)] πf(C, n)1.(14)Thus, the new estimator of C proposed here solvesS#π¯92(n®2)3(n®1):πgπ(C, n)9π#®[(n1)}3(n®1)]πf(C, n)1:,(15)where π and S#πare observed values, calculated froma sample of DNA sequences using (1) and (3). Thisestimator, called Cqπ, differs from Hudson’s (1987)estimator, Cqk, in two main respects : only the n(n®1)}2unique pairwise comparisons among the n sequencesare made, and an unbiased estimate of θ#is employed.3. Performance in simulationsMonte Carlo simulations, using the method of Hudson(1983), were done to assess the statistical properties ofCqπ, and to compare its performance with that of Cqk.Figure 1 compares estimates of the distributions ofCqπ}C and Cqk}C, where C is the true value of therecombination parameter, for the same


Using the variance of pairwise differences to estimate the recombination rate

Download Using the variance of pairwise differences to estimate the recombination rate
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Using the variance of pairwise differences to estimate the recombination rate and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Using the variance of pairwise differences to estimate the recombination rate 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?