CSE280b Population Genetics Vineet Bafna Pavel Pevzner March 2006 www cse ucsd edu classes sp05 cse291 Vineet Bafna Simulating population data Generate a coalescent Topology Branch lengths For each branch length drop mutations with rate Generate sequence data Note that the resulting sequence is a perfect phylogeny Given such sequence data can you reconstruct the coalescent tree Only the topology not the branch lengths Also note that all pairs of positions are correlated should have high LD March 2006 Vineet Bafna Coalescent with Recombination An individual may have one parent or 2 parents March 2006 Vineet Bafna ARG Coalescent with recombination Given mutation rate recombination rate population size 2N diploid sample size n How can you generate the ARG topology branch lengths efficiently How will you generate sequences for n individuals Given sequence data can you reconstruct the ARG topology March 2006 Vineet Bafna Recombination Define r as the probability of recombining per generation Assume k individuals in a generation The following might happen 1 2 3 4 An individual arises because of a recombination event between two individuals It will have 2 parents Two individuals coalesce Neither Each individual has a distinct parent Multiple events low probability March 2006 Vineet Bafna Recombination We ignore the case of multiple 1 events in one generation Pr No recombination 1 kr Pr No coalescence k 2 1 2N Consider scaled time in units of 2N generations Thus the number of individuals increase with rate k kr2N and decrease with rate 2 The value 2rN is usually small and therefore the process will ultimately coalesce to a single individual MRCA March 2006 Vineet Bafna ARG Let k n Define 4rN Iterate until k 1 What is the flaw in this procedure Choose time from an exponential distribution with rate k k 2 2 Pick event as recombination with probability k 1 If event is recombination choose an individual to recombine and a position else choose a pair to coalesce Update k and continue March 2006 Vineet Bafna Ancestral Recombination Graph March 2006 Vineet Bafna Simulating sequences on the ARG Generate topology and branch lengths as before For each recombination generate a position Next generate mutations at random on branch lengths For a mutation select a position as well Generate Sequence data Program called ms Hudson is a commonly used coalescent simulator March 2006 Vineet Bafna Coalescent theory applications Coalescent simulations allow us to test various hypothesis The coalescent ARG is usually not inferred unlike in phylogenies March 2006 Vineet Bafna Coalescent theory example Ex 1400bp at Sod locus in Dros 10 taxa 5 were identical The other 5 had 55 mutations Q Is this a chance event or is there selection for this haplotype March 2006 Vineet Bafna Coalescent application 10000 coalescent simulations were performed on 10 taxa 55 mutations on the coalescent branches Count the number of times 5 lineages are identical The event happened in 1 1 of the cases Conclusion selection or some other mechanism explains this data March 2006 Vineet Bafna Coalescent example Out of Africa hypothesis Looking at lineage specific mutations might help discard the candelabra model How How do we decide between the multi regional and Out ofAfrica model How do we decide if the ancestor was African March 2006 Vineet Bafna Human Samples We look at data from human samples Gabriel et al Science 2002 3 populations were sampled at multiple regions spanning the genome March 2006 54 regions Average size 250Kb SNP density 1 over 2Kb 90 Individuals from Nigeria Yoruban 93 Europeans 42 Asian 50 African American Vineet Bafna Population specific recombination D was used as the measure between SNP pairs SNP pairs were classified in one of the following Strong LD Strong evidence for recombination Others 13 of cases This roughly favors outof africa A Coalescent simulation can help give confidence values on this March 2006 Vineet Bafna Gabriel et al Science 2002 Haplotype Blocks A haplotype block is a region of low recombination Define a region as a block if less than 5 of the pairs show strong recombination Much of the genome is in blocks Distribution of block sizes vary across populations March 2006 Vineet Bafna Testing Out of Africa Generate simulations with and without migration Check size of haplotype blocks Does it vary when migrations are allowed When the new population has a bottleneck If there was a bottleneck that created European and Asian populations can we say anything about frequency of alleles that are African specific Should they be high frequency or low frequency in African populations March 2006 Vineet Bafna Haplotype Block implications The genome is mostly partitioned into haplotype blocks Within a block there is extensive LD Is this good or bad for association mapping March 2006 Vineet Bafna Coalescent reconstruction Reconstructing likely coalescents March 2006 Vineet Bafna Re constructing history in the absence of recombination March 2006 Vineet Bafna An algorithm for constructing a perfect phylogeny We will consider the case where 0 is the ancestral state and 1 is the mutated state This will be fixed later In any tree each node except the root has a single parent It is sufficient to construct a parent for every node In each step we add a column and refine some of the nodes containing multiple children Stop if all columns have been considered March 2006 Vineet Bafna Inclusion Property For any pair of columns i j i j if and only if i1 j1 Note that if i j then the edge containing i is an ancestor of the edge containing i March 2006 Vineet Bafna i j Example r 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 A B C D Initially there is a single clade r and each node has r as its parent March 2006 Vineet Bafna E Sort columns Sort columns according to the inclusion property note that the columns are already sorted here This can be achieved by considering the columns as binary representations of numbers most significant bit in row 1 and sorting in decreasing order March 2006 Vineet Bafna A B C D E 1 1 0 1 0 1 2 1 0 1 0 0 3 0 1 0 1 0 4 0 0 1 0 0 5 0 0 0 1 0 Add first column In adding column i Check each edge and decide which side you belong Finally add a node if you can resolve a clade A B C D E 1 1 0 1 0 1 2 1 0 1 0 0 3 0 1 0 1 0 4 0 0 1 0 0 r u A March 2006 Vineet Bafna C E B D 5 0 0 0 1 0 Adding other columns Add other columns on edges using the ordering property A B C D E 2 1 0 1 0 0 3 0 1 0
View Full Document