DOC PREVIEW
Berkeley STATISTICS 246 - Human SNP haplotypes

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Human SNP haplotypesStatistics 246, Spring 2002Week 15, Lecture 1Human single nucleotide polymorphismsThe majority of human sequence variation is due tosubstitutions that have occurred once in the history ofmankind at individual base pairs, SNPs (Patil et al, 2001listed at the end, and refs therein).It has been estimated that > 5 million common SNPs,each with a frequency of 10% - 50% account for the bulk ofhuman DNA sequence difference.Such SNPs are present in the human genome about 1 inevery 600 base pairs.Alleles making up blocks of such SNPs in close physicalproximity are often correlated, and define a limited numberof SNP haplotypes, each of which reflects descent from asingle, ancient ancestral chromosome.The Daly et al (2001) data set This consists of 103 common SNPs (>5% minor allele frequency) in a500 kb region implicated in Crohn disease, genotyped in 129 trios(mom, pop, kid) from a European derived population, giving 258transmitted and 258 untransmitted chromosomes. Studies to date have revealed great variability in local haplotypestructure: the relative contributions of mutation, recombination,selection, population history, and stochastic events seems to varyunpredictably. Some haplotypes extend only a few kb, while othersextend for > 100 kb. Here is some evidence from Figure 1 of Daly et al, 2001. Linkagedisequilibrium (LD) between an arbitrary marker (#26 in a, #61 in c,see *) and every other marker in the data set is indicated, using thenormalized association measure D’= (ad-bc)/(a+c)(c+d) of LD. Note the noisiness of the plot.Daly et al (2001), Figure 1Measures of association in 2××××2 tables Given positive observed frequencies from a 2×2 table, say a, b, c and d forthe cells11, 10, 01 and 00 respectively, how do we measure associationbetween the two classifications? Put a+b+c+d=n. Geneticists like to use D = p11 - p1+p+1 where p11 = a/n, p1+ = (a+b)/n and p1+ = (a+c)/n. One long recognisedtrouble with this measure is that its values can be greater or smaller,depending on the marginal proportions p1+ and p1+ . Ideally, one would likea measure of association which captured just association, and wasparametrically independent of the marginal frequencies. One exists,namely the odds ratio ϕϕϕϕ = ad/bc, equivalently, λλλλ = logϕ = log(ad/bc). This has the nice property that for any specified marginal probabilities p+1and p1+ between 0 and 1 and any value of λ, there is a unique 2×2 tablewith these marginals and log odds ratio. Despite this wonderful result,geneticists continue to use a normalized D, namely, D’ = D/ Dmax whereDmax is the largest value of D with the given marginals. If D > 0, we canshow (Exercise!) Dmax= min {p1+(1-p+1), (1-p1+)p+1}. Check that this leads to the formula quoted in the previous slide but one.Human SNP haplotypes, cont. If we identify the underlying haplotypes, the LD picture becomes clearer. InFigure 1b, a multi-allelic form of D’ is used to plot LD between the maximumlikelihood haplotype group assignment at the location of the 26th markerand that assignment at the location of every other marker in the set. Herethe haplotypes have been blocked (details later), and each block treated asan allele. Figure 1d repeats 1b, but with the 61st marker. Note that when haplotypes rather than single SNPs are used, there is muchless noise. There is a r×c table analogue of the result cited earlier, involving (r-1)×(c-1)log odds ratios and r+s-1 marginal frequencies, but what geneticists wanthere is a single number summarizing the association in an rxc table wheremax(r,c) >2. No entirely satisfactory single number exists, though manyhave been tried and many are in use. For the multi-allelic form of D’ used above, see Hedrick, Genetics 117,331-341, 1987, “Gametic disequilibrium measures: proceed with caution”.The block structure of haplotypes Daly et al (2001) we able to infer offspring haplotypes largelyfrom parents, with a little help from the EM when parents andchildren were both heterozygous, see last week. They say that“it became evident that the region could be largely decomposedinto discrete haplotype blocks, each with a striking lack ofdiversity (Fig. 2)”. The haplotype blocks span up to 100kb and contain 5 or morecommon SNPs. For example, one 84 kb block of 8 SNPs showsjust two distinct haplotypes accounting for 95% of the observedchromosomes (Table 1).A long haplotype blockConstruction of the haplotype blocks If I have time I’ll describe Daly’s method of determining haplotype blocks.Basically they define an HMM rather like the one used to map markers onmouse chromosomes (MapMaker) and estimate what they term the“historical recombination frequency θ” between each pair of consecutiveSNPs. Their “data” is an assignment of each chromosome to one of fourancestral haplotypes. Consecutive SNPs are then in the same block if θ < 1% (73/103), with 14 having 1% < θ < 4% and 9 with θ> 4%. The approach is justified by the observation that the visually definedhaplotype blocks have only a few (2-4) haplotypes which show no evidenceof being derived from one another by recombination, and which account fornearly all chromosomes (>90%) in the sample. Further, the discrete blocksare separated by intervals in which several independent recombinationevents seem to have occurred, giving rise to greater haplotype diversity inregions spanning the blocks, see Figure 2. Finally, we see that the haplotypes at the various blocks can be readilyassigned to one of just four ancestral long-range haplotypes.Daly et al (2001) Figure 2Patil et al (2001)The data in this paper derives from a publicly available panel of 24ethnically diverse individuals, and concerns chromosome 21SNPs. The two chromosomes of each individual were separatedusing rodent-human somatic cell hybrid technology, and sowere able to be typed separately, leading directly to haplotypes.Overall, 20 independent copies of chr 21 were analyzed for SNPdiscovery and haplotype structure. The typing was done on specially constructed high-densityoligonucleotide arrays (Affymetrix), and in total, they identified35,989 SNPs in their sample of 20 chromosomes. The allele frequency distribution is depicted in Figure 1A, seenext page. The 32 Mbp of chr 21 DNA was then divided into 200kb segments, and the observed heterozygosity was used tocalculate an average nucleotide diversity for


View Full Document

Berkeley STATISTICS 246 - Human SNP haplotypes

Documents in this Course
Meiosis

Meiosis

46 pages

Meiosis

Meiosis

47 pages

Load more
Download Human SNP haplotypes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Human SNP haplotypes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Human SNP haplotypes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?