DOC PREVIEW
UCSD CSE 280B - Population Sub-structure

This preview shows page 1-2-3-24-25-26-27-49-50-51 out of 51 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CSE280b Population Genetics Vineet Bafna Pavel Pevzner March 2006 www cse ucsd edu classes sp06 cse280b Vineet Bafna Population Sub structure March 2006 Vineet Bafna Population sub structure can increase LD Consider two populations that were isolated and evolving independently They might have different allele frequencies in some regions Pick two regions that are far apart LD is very low close to 0 March 2006 Vineet Bafna 0 0 0 1 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 Pop A p1 0 1 q1 0 9 P11 0 1 D 0 01 Pop B p1 0 9 q1 0 1 P11 0 1 D 0 01 Recent ad mixing of population If the populations came together recently Ex African and European population artificial LD might be created D 0 15 instead of 0 01 increases 10 fold This spurious LD might lead to false associations Other genetic events can cause LD to arise and one needs to be careful March 2006 Vineet Bafna 0 0 0 1 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 Pop A B p1 0 5 q1 0 5 P11 0 1 D 0 1 0 25 0 15 Determining population sub structure Given a mix of people can you sub divide them into ethnic populations Turn the problem of spurious LD into a clue Find markers that are too far apart to show LD If they do show LD correlation that shows the existence of multiple populations Sub divide them into populations so that LD disappears March 2006 Vineet Bafna Determining Population sub structure Same example as before The two markers are too similar to show any LD yet they do show LD However if you split them so that all 0 1 are in one population and all 1 0 are in another LD disappears March 2006 Vineet Bafna 0 0 0 1 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 Iterative algorithm for population substructure Define N number of individuals each has a single chromosome k number of sub populations Z 1 k N is a vector giving the subpopulation Zi k individual i is assigned to population k Xi j allelic value for individual i in position j Pk j l frequency of allele l at position j in population k March 2006 Vineet Bafna Example Ex consider the following assignment P1 1 0 0 9 P2 1 0 0 1 March 2006 Vineet Bafna 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 1 1 0 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 Goal X is known P Z are unknown The goal is to estimate Pr P Z X Various learning techniques can be employed maxP Z Pr X P Z Max likelihood estimate maxP Z Pr X P Z Pr P Z MAP Sample P Z from Pr P Z X Here a Bayesian MCMC scheme is employed to sample from Pr P Z X We will only consider a simplified version March 2006 Vineet Bafna Algorithm Structure Iteratively estimate Z 0 P 0 Z 1 P 1 Z m P m After convergence Z m is the answer Iteration Guess Z 0 For m 1 2 Sample P m from Pr P X Z m 1 Sample Z m from Pr Z X P m How is this sampling done March 2006 Vineet Bafna Example Choose Z at random so each individual is assigned to be in one of 2 populations See example Now we need to sample P 1 from Pr P X Z 0 Simply count Nk j l number of people in pouplation k which have allele l in position j pk j l Nk j l N March 2006 Vineet Bafna 1 2 2 1 1 2 1 2 1 2 0 0 0 1 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 1 2 2 1 1 2 1 2 2 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 Example Nk j l number of people in population k which have allele l in position j pk j l Nk j l Nk j N1 1 0 4 N1 1 1 6 p1 1 0 4 10 p1 2 0 4 10 Thus we can sample P m March 2006 Vineet Bafna 1 2 2 1 1 2 1 2 1 2 0 0 0 1 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 1 2 2 1 1 2 1 2 2 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 Sampling Z Pr Z1 1 Pr 01 belongs to population 1 We know that each position should be in linkage equilibrium and independent Pr 01 Population 1 p1 1 0 p1 2 1 4 10 6 10 0 24 Pr 01 Population 2 p2 1 0 p2 2 1 6 10 4 10 0 24 Pr Z1 1 0 24 0 24 0 24 0 5 March 2006 Vineet Bafna Assuming HWE and LE Sampling Suppose during the iteration there is a bias Then in the next step of sampling Z we will do the right thing Pr 01 pop 1 p1 1 0 p1 2 1 0 7 0 7 0 49 Pr 01 pop 2 p2 1 0 p2 2 1 0 3 0 3 0 09 Pr Z1 1 0 49 0 49 0 09 0 85 Pr Z6 1 0 49 0 49 0 09 0 85 Eventually all 01 will become 1 population and all 10 will become a second population March 2006 Vineet Bafna 1 1 1 2 1 2 1 2 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 2 2 2 1 2 2 1 2 2 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 Allowing for admixture Define qi k as the fraction of individual i that originated from population k Iteration Guess Z 0 For m 1 2 March 2006 Sample P m Q m from Pr P Q X Z m 1 Sample Z m from Pr Z X P m Q m Vineet Bafna Estimating Z admixture case Instead of estimating Pr Z i k X …


View Full Document

UCSD CSE 280B - Population Sub-structure

Download Population Sub-structure
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Population Sub-structure and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Population Sub-structure and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?