DOC PREVIEW
Berkeley STATISTICS 246 - Genes and MS in Tasmania

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Genes and MS in Tasmania cont Lecture 6 Statistics 246 February 5 2004 1 Nature and numbr of relatives needed to give accurate haplotypes Exercise Explain why it is that when we have both sets of parental genotypes and the markers are reasonably polymorphic we can reconstruct an individual s haplotypes with high probability What are the difficult cases If we have no parents or just one parent and grandparents siblings or offsprings genotypes are available which are most informative for an individual s haplotype reconstruction Simulation Study Simulated many different types of pedigrees 300 times each to see which constellations of relatives give the best opportunity of being able to reconstruct haplotypes correctly Ranking the contributions in order of importance assuming that the proband has been genotyped 1 Parents 2 Grandparents Siblings 3 Offspring 3 Genotyping We used STR short tandem repeat also known as microsatellite markers AGCTAGCGCGC GCGCGGCATTA AGCTAGCGCGC GCGCGGCGCATTA Eventual plan 5 cM genome wide scan 800 markers with dinucleotide STRs 4 Data collected in the Tasmanian MS Study MS study in Tasmania data Collected 170 out of an estimated 300 MS cases and 105 controls and a constellation of 4 relatives for each Created a case control study with 338 case haplotypes and 208 control haplotypes Genotyping carried out at the Australian Genome Research Facility almost 1 million genotypes the 2nd largest genotyping project ever carried out in Australia 6 Relatives of cases and controls Cases 170 Controls 105 Grdpts 29 4 Parents 174 22 123 30 Siblings 374 46 215 52 Spouse 67 7 14 3 Offspring 168 20 50 12 Other 17 1 809 12 3 0 414 7 Some issues associated with the data preparation Errors errors and errors Marker location errors allocation to wrong chromosome wrong order map distances out G n thon Dib et al 1996 Marshfield Broman et al 1998 DeCODE Kong et al 2001 included a physical map Pedigree relationship errors PREST McPeek Sun Genotyping errors caused by assay or analysis ones causing Mendelian inconsistencies ones which don t PEDCHECK O Connell SIBMED Douglas et al MERLIN Abecasis et al Data handling errors e g mixed up samples Binning allele labelling errors inconsistencies over time 9 Error checking a little detail With genome wide genotypes moderately close relationships can be confirmed or falsified 7 paternity errors 6 incorrect fathers 1 incorrect mother Mix ups typically stand out 2 DNA sample swaps 2 duplicate samples 1 case of contaminated DNA 1 adopted child unrelated to anyone else Mendelian checks picked up many genotyping errors 1 472 inconsistencies 0 15 genotyping errors 15 markers removed using Mendel on the X found 3 data entry errors and 4 cases where the recorded sex was wrong Multilocus methods can pick up more in effect identifying close double recombinants 58 errors inferred by this method and put to missing Other errors demanded special methods 10 Unforeseen Problem Marker binning was not consistent over time Genotyping at 796 markers took over 2 years Heuristic approach Look at all markers with allele bin differences of 1 bp Seek large frequency differences 2 allele by box Carry out allele binning slippage test for pairs of adjacent alleles and boxes 2 Markers were flagged if any of the above and examined for systematic trends A founder is an individual with no parent in the sample 11 Example output showing partial allele slippage Note slippage of allele 104 into allele 106 for Box 7 yellow Absolute frequencies for given allele 106 in each box is shown in time order of genotyping Alleles in size order Summary information Time order of Genotyping Box 1 2 3 5 6 7 21 23 24 27 Numbers indicate number of individuals in each box 12 Isolated bin probably slippage 13 Example of highly polymorphic marker 14 Box 1 all alleles shifted 1bp 15 Box 1 Alleles 150 154 shifted 1bp 16 Fixing allele calls Need to track changes carefully 17 Obtaining haplotypes Haplotypes were reconstructed using the Lander GreenKruglyak algorithm Genehunter Merlin Allegro We ll go into the details of the algorithm later this lecture or in the next Appropriate case and control datasets with these haplotypes were then prepared Here s how from Genehunter output 18 Genehunter Output The genotype data for family MS003 input MS003 MS003 MS003 MS003 MS003 301 303 302 2 302 0 0 2 303 0 0 1 304 303 302 2 305 303 302 1 2 1 1 0 0 5 2 5 2 5 8 8 5 5 8 10 9 4 4 10 11 11 10 9 11 5 0 0 5 5 7 0 0 5 7 3 7 7 7 3 4 4 3 7 4 3 0 0 1 3 5 0 0 4 5 7 0 0 4 7 9 0 0 6 9 1 0 0 4 1 8 0 0 5 8 1 1 1 1 1 6 6 1 1 6 2 0 0 3 2 3 0 0 4 3 5 0 0 4 5 6 0 0 6 6 5 5 5 6 5 5 6 8 8 5 The haplotype reconstruction for family MS003 output MS003 0 000 302 0 0 1 8 11 7 2 9 5 303 0 0 1 5 10 5 5 4 5 301 303 302 2 5 10 5 8 11 7 304 303 302 0 5 4 5 2 9 5 305 303 302 0 5 10 5 8 11 7 4 7 3 7 3 4 7 7 3 4 5 4 3 1 3 5 1 4 3 5 9 6 7 4 7 9 4 6 7 9 8 5 1 4 1 8 4 5 1 8 6 1 1 1 1 6 1 1 1 6 3 4 2 3 2 3 3 4 2 3 6 6 5 4 5 6 4 6 5 6 5 6 5 8 5 5 8 6 5 5 Father s transmitted haplotype Mother s transmitted haplotype Proband Untransmitted haplotypes 19 Extracting untransmitted haplotypes from GENEHUNTER Three types of controls untransmitted haplotypes akin to controls in TDT haplotypes from matched controls to the affecteds random controls To derive the untransmitted haplotypes use GENEHUNTER to generate the haplotypes creates haplo dump file extract the untransmitted haplotype use reconstructed haplotypes of the parents to find the untransmitted haplotype of the affected by negation Example Affected s haplotypes Haplotypes of Parents of affected 2 5 2 12 10 6 8 230 0 1 02 1221 1 77 1 2 3 13 12 9 7 2 5 2 12 10 6 8 1 2 3 13 12 9 8 Untransmitted haplotypes are 2 3 0 0 1 0 2 1 2 2 1 1 7 8 Assessing haplotype sharing Nonparametric haplotype sharing analysis Why nonparametric rather than likelihood based methods Likelihood methods make many assumptions regarding the genealogy of the population We don t how many of these assumptions are robust to violations Likelihood methods are computationally intensive perhaps prohibitively so especially for genome wide scans where these is a need to maximize over the very large state space of possible ancestral haplotypes MCMC Likelihood methods have a hard time …


View Full Document

Berkeley STATISTICS 246 - Genes and MS in Tasmania

Documents in this Course
Meiosis

Meiosis

46 pages

Meiosis

Meiosis

47 pages

Load more
Download Genes and MS in Tasmania
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Genes and MS in Tasmania and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Genes and MS in Tasmania and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?