Berkeley STAT 157 - From neutral alleles to diversity statistics

Unformatted text preview:

Chapter 11From neutral alleles todiversity statisticsThe two topics in the lecture title are rather different, but we will see theyhave a particular “sideways” connection. Let me jump into the first topic.11.1 What maintains genetic diversity? The neu-tral modelWe have an everyday notion of species – rabbits and robins and roses – be-cause we can recognize different ind i vi dual s as simil ar. After learning aboutgenetics and evolution by nat ur al selection, a rather subtle question arises:why is there any genetic difference at all between different individuals in thesame species? That is, if “evolution by natural selection” worked accordingthe the simple “selective sweeps by more fit alleles” story in Lecture 10,then why hasn’t it already happened, so that all the less fit alleles have beenreplaced by the most fit allele, lead in g to indiv i du al s genetically identicalexcept for sex-related traits?Many different answers have been proposed, and undoubtedly many arevalid in different contexts. A text book answer is heterozygote advantage (W), illustrated by si ckle cell anemia in humans. (Given that most phenotypevariations presumably arise from comp li c at ed interactions between genes,there is much scope for this kind of effect). Another answer is frequencydependent selection (W) , where it is advantageous to be different fromothers, in contexts of predation or competition. Another answer is thatwe may just be seei n g a selecti ve sweep i n pr ogr es s, though the toy modelprediction (10. 4) for sweep duration suggests this is unlikely.111112CHAPTER 11. FROM NEUTRAL ALLELES TO DIVERSITY STATISTICSWe wi l l consider the neutral theory1, which asserts that much of thevariation we see in a species at a particular time is “non-selective”; differentalleles have arisen by chance mutations some time in the past but havealmost zero difference in fitness, implying that the frequen ci es of alleles insuccessive generations change only in some “random” way rather than beingpushed in one direction by selection.Rather obviously this appe als to mathematical probabilists, so let meshow some predictions within this theory.11.2 The Wright-Fisher modelConsider a gene with several alleles A, B, C . . . . In diploid pop-ulations consisting of N individuals in each generation there are2N copies of each gene. An individual can have two copies of thesame allele or two different alleles. Assume generations do notoverlap. For example, annual plants have exactly one generationper year. In the model, each copy of the gene found in the newgeneration is drawn independently at random from all copies ofthe gene in the old generation. (Edited from genetic drift (W)).This is “the Wright-Fisher model wi t h out mutation or selection”.The model looks strange as biology, but turns out to be mathematicallytractable, and behave s similarly to more plausible models in which parentshave offspring independently while some external mechanism keeps the pop-ulation size roughly stable.We now introduce mutation by supposing that, each tine a gene is c opied,there is a small chance p of a mutation, and that each such mutation pro-duces a brand new allele. The process of “numbers of alleles of differenttypes” is abstractly a certain complicated finite-state Markov chain, andfrom the theory of Markov chains2there must be a stationary distributionfor the proportions X1≥ X2≥ X3... of different alleles, listed in decreas-ing order for definiteness. The remarkable Ewens’s sampling f or m ul a (W)gives the exact distribution of the (Xi), but instead let me derive a simplerstatistical measure of diversity. What we do – and as the second topic ofthis lecture (section 11. 7) we explain the idea in detail with data examples– is first consider S :=�iX2iand note that ES is the chance that two1neutral theory of molecular evolution (W )2e.g. Pinsky-Karlin An Introduction to Stochastic Mo deling Chap. 4.11.2. THE WRIGHT-FISHER MODEL 113randomly-picked genes are the same allele; t he n we can viewneff:= 1/ESas “effective number of differe nt c o- ex i st i n g al l el e s of the gene” in the pop-ulation.We shall derive the formulaneff≈ 1+4Np. (11.1)The approximation holds in the (realistic) case where N is large and p issmall, and we think of 4Np as a number – maybe 0.2, maybe 10 – that isneither very large nor very small.Discussion of formula (11.1). This is one of my favorite formulas todiscuss. It confirms and quantifies the idea that pure randomness (mutationswithout selective advantage) can maintain a fixed level of diversity as timegoes by; so the neutral theory is at least a possible explantion of diversity.But how realistic is the model?What you don ’ t see in the model descripti on or the concluding formula,but is burie d in the algebra derivation, is t h e requirement that the modelmust have be en real i st i c over the last (order) N generations (a related resultis given later as formula (11.3). Thinking of tot al species population Nas in the millions, this is hardly plausible, since the time involved wouldbecome larger than species lifetime. The model implictly ignores geographiclocation of ind iv i d ual s – any pair can breed – and it is often argued thatwhat is relevant is a much smaller “effective population size” of interbreedingsubpopulations. Another point, emphasized in different contexts by RichardDawkins3is that one should think of genes as existing separat e l y from species– mice and men have much of the genome in common – so that specieslifetime is less of an issue.Mathematical derivation of formula (11.1). The key feat u r e thatmakes the model mathematically t r act ab l e is that we can easily study geneal-ogy. Ignoring mutations for the moment, a gene in the prese nt generation isa copy of a gene in the previous generati on , which is a copy of a gene in th eprevious generation, and so on: there is a “li ne of descent”. Now considertwo randomly-picked genes in the present generation, and trace back t hetwo lines of descent until they meet, some random number G of generations3The Selfish Gene.114CHAPTER 11. FROM NEUTRAL ALLELES TO DIVERSITY STATISTICSback, at t h ei r “most recent common anc est or ”. From the definition of themodel, at each stage t h er e is chance 1/(2N ) that the lines merge, and soG has the Geometric(1/(2N)) distribution. Introducing mutations (withoutselection) doesn’t change the


View Full Document

Berkeley STAT 157 - From neutral alleles to diversity statistics

Download From neutral alleles to diversity statistics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view From neutral alleles to diversity statistics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view From neutral alleles to diversity statistics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?