DOC PREVIEW
Berkeley STATISTICS 246 - Genes and MS in Tasmania

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Genes and MS in Tasmania, completed.Lecture 7, Statistics 246February 12, 20042Towards a sharing statistic Our aim was to come up with a statistic that effectivelydescribes haplotype sharing differences between case and“control” haplotypes The sharing statistic should be largest at markers closest toa disease locus, as haplotype sharing there should- extend the furthest; &- the association of disease with particular haplotypesshould be strongest3Nonparametric haplotype sharing analysis Why nonparametric, rather than likelihood-based methods?• Likelihood methods make assumptions regarding the genealogy ofthe population, and we don’t how many of these assumptions arerobust to violations.• Likelihood methods are computationally intensive, especially forgenome wide scans, where these is a need to maximize over thevery large state space of possible ancestral haplotypes (MCMC)• Likelihood methods have a hard time at the HLA region, because theLD there is extremely high and non uniform (block-like structure)• Simpler statistics will probably do better here, unless we can modelbackground LD4Haplotype sharing statistics forgenome wide scan data cf. fine mapping• Previous (usually likelihood-based) statistics have concentratedon fine mapping and the exact localization of a variant allele.They assume a signal exists.• For us, localization was not the primary interest. Rather,detection was our main interest, using a genome-wide scan• We needed something that was not as computationallyintensive as DHSMAP (McPeek & Strahs, 1999), BLADE(Liu et al, 2001), DMLE+ (Rannala & Reeve, 2001), or theshattered coalescent (Morris et al , 2002).5Haplo_clusters (Melanie Bahlo)• Calculates a sharing statistic at every marker• Obtains a p-value at every marker using a permutation test• Allows for several clusters of ancestral haplotypes(allelic heterogeneity)63 5 9 8 7 6 10 1 5 4 3 2 5 Cases 3 2 1 3 7 6 10 1 5 4 1 3 21 2 1 3 5 6 10 1 5 2 1 3 42 3 7 3 1 6 10 9 1 1 2 5 6 5 9 1 1 4 1 3 1 2 3 1 9 87 6 5 3 1 3 2 1 5 9 7 9 1 Controls7 1 2 1 1 3 5 7 1 5 1 3 29 3 9 2 1 2 7 5 3 4 2 2 5Testing for shared haplotypesScore forhaplotypesharing(- log p)Pter- -Qter7Sharing drop-off & allelic heterogeneity Marker Proportions of Cases Proportions of Controls1234= Cluster 1haplotypes= Cluster 2haplotypes= neither cluster 1nor 2 haplotypes8Haplo_cluster in action011 3302 1CasesControlsHaplotype1110003 11 21 1Example: Sorting on marker 1 for a sample of 3 case and 4control haplotypes2 1 32 1 32 1 41 1 21 2 31 3 33 1 2After sort on haplotype consisting only of marker1, calculate a chi-square statistic, and move onCases ControlsCasesControlsHaplotype103030321After sorting on haplotype consisting of marker 1 and marker2, calculate a chi-square statistic, and …. Eventually stop, and sum the chi-square statistics. Then repeat for a suitably large number of random permutations of cases and controls.9Statistic to evaluate haplotype sharingSharing statistic is χ2 based,using the idea of multipleancestral haplotypes (clusters)which are “grown” starting ateach marker examined in thescan.Significance is evaluated via apermutation test: choose arandom permutation of thepooled cases and controls,and recalculate the statistic;repeat ~20,000 times.A recursive form for theestimator and and the SDof the p-value was used,to enable earlytermination of program€ Si=χi, j,k2k=1K∑j=1K∑χi, j,k2=χ12test for associationbetweenthenumberof case and control haplotypes still sharing theancestral haplotype of cluster k at marker j,after starting at marker i.10The permutation testThe idea is this. We have 170 cases and 105 controls, and atany particular locus, we calculate the value of our statistic,calling it S.Now pool our cases and controls into 275 individuals, andsample 170 to be “cases” at random from the 275, calling theremainder “controls”. For this first artificial set of cases andcontrols, calculate the value of our statistic, S1 say.Next, we repeat this procedure 9,999 more times, say, obtainingvalues S2 , S3 , S4 … S10,000 . As long as 10,000 is sufficientlymany random permutations, we can get a good estimate ofthe p-value of our initial statistic relative to our empiricallyestimated null distribution, as p = #{i: Si > S }/10,000.11Exercises1. How should we decide what number of resamplings islarge enough?2. Explain in the simple case of a 2×2 table of cases andcontrols cross-classified as diseased and healthy, howusing all possible resamplings, rather than a fixed sizerandom sample, leads to the p-value for the exact test.3. To avoid carrying out an unnecessarily large number ofpermutations, the proportion of resampled values of ourstatistic exceeding the value S can be monitored. Canyou describe a stopping rule for the random resamplingsthat should lead to “accurate enough” p-values, withoutgoing to the full number each time?12Haplo_clusters - Output-opt 1 Genetic distances used to decide order of markers to sort on-c 1 The number of clusters of haplotypes to look for = 1-miss 1 The missing data is replaced randomly using the 2 marker haplotype information.-share 5 The number of haplotypes needed to share = 5The standard deviation p values are calculated to 0.01*phat.Marker names have been provided and will be used in the output files.# of case haplotypes = 338# of contol haplotypes = 208# of markers = 11# of perms = 100000Marker Mapdistance Chi_Square p sd(p) -log(p) perms====================================================D21S1911 0 5.34 4.44e-01 4.44e-03 0.35 12510D21S1904 0.85 6.17 3.63e-01 3.63e-03 0.44 17577D21S1899 10.36 5.89 4.37e-01 4.37e-03 0.36 12876D21S1922 16.46 2.97 6.83e-01 6.83e-03 0.17 4636D21S1884 17.26 4.74 4.14e-01 4.14e-03 0.38 14135D21S1914 20.82 6.49 3.38e-01 3.38e-03 0.47 19571D21S263 28.97 4.06 5.24e-01 5.24e-03 0.28 9077D21S1252 39.41 1.18 8.66e-01 8.65e-03 0.06 1553D21S1919 42.51 1.38 8.51e-01 8.51e-03 0.07 1751D21S1255 43.81 2.24 7.24e-01 7.24e-03 0.14 3805D21S266 51.51 3.86 5.70e-01 5.70e-03 0.24 7557===================================================13Haplo_clusters - Output IITable of haplotypesMarker Cluster Haplotype Length(Haplotype)D21S1911 D21S1904 D21S1899 D21S1922 D21S1884 D21S1914 D21S263 D21S125===================================================D21S1884 1 - - 6 3 3 8 11 -# of haplos: - - 5 82 163 22 2 -Chi-square: - - 0.2


View Full Document

Berkeley STATISTICS 246 - Genes and MS in Tasmania

Documents in this Course
Meiosis

Meiosis

46 pages

Meiosis

Meiosis

47 pages

Load more
Download Genes and MS in Tasmania
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Genes and MS in Tasmania and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Genes and MS in Tasmania 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?