DOC PREVIEW
CMU CS 10701 - Recitation

This preview shows page 1-2-17-18-19-35-36 out of 36 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in human populationsOutlineSlide 3Part I: Learning haplotype block structurePrevious models for genotype dataProbabilistic generative model for genotype dataA probabilistic model for genotype dataLearning the model for genotype dataVariational inference and parameter estimationPredicting missing genotype dataSlide 11Prediction error for Crohn’s/5q31 dataComparative performance for Crohn’s/5q31 dataReconstructing phaseSlide 15How many ancestors?Establishing haplotype block boundariesSlide 18Haplotype block structure in the ENm006 regionPattern usage in Chromosome 5q31Part II: Linking haplotype block structure and gene expression dataA model for linking haplotype structure to quantitative trait measurementsA Bayesian model for linking haplotype structure to quantitative measurementsSlide 24Slide 25Slide 26Slide 27Variational Bayes for inferring relationships between haplotype blocks and quantitative measurementsVariational Bayes updatesLinking haplotype blocks to phenotypeLinking haplotype blocks to gene expressionAddressing population stratificationAssociations between haplotype blocks and gene expressionSummaryThe road ahead…Acknowledgements24/07/2007 ISMB/ECCB 200724/07/2007ISMB/ECCB 2007Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in human populations Anitha Kannan and John WinnJim Huang*Probabilistic and Statistical Inference Group, Edward S. Rogers Department of Electrical and Computer Engineering University of Toronto Toronto, ON, CanadaMicrosoft Research Cambridge Machine Learning and Perception Group Cambridge, UK24/07/2007 ISMB/ECCB 200724/07/2007ISMB/ECCB 2007Outline•Main contributions:•Joint Bayesian modelling of genetic variation data and quantitative trait measurements•Rich probabilistic model for genotype data•State-of-the-art results on predicting missing genotypes24/07/2007 ISMB/ECCB 2007OutlineGenotype: Unordered pair of SNPs along both chromosomesHaplotype: Ordered set of SNPs along a chromosome Presence of recombination hotspots partitions haplotypes into blocks [Daly, 2001]24/07/2007 ISMB/ECCB 2007Part I: Learning haplotype block structure•Our model for genotype data should:–Account for phase & parent-child information–Account for uncertainty in ancestral haplotypes–Account for uncertainty in block structure–Account for population-specific haplotype block statistics–Allow for prior knowledge of haplotype block structure24/07/2007 ISMB/ECCB 200724/07/2007ISMB/ECCB 2007Previous models for genotype data•Previous methods learn a low-dimensional representation of the genotype data:•HAPLOBLOCK (Greenspan, G. and Geiger, D. RECOMB 2003)–Hard partitioning of data into set of haplotype blocks using low-dimensional “ancestral” haplotypes•fastPHASE (Scheet P. and Stephens, M. Am J Hum Genet 2006)–Learn ancestral haplotypes from high-dimensional genotype data while accounting for uncertainty in haplotype blocks•Jojic, N., Jojic, V. and Heckerman, D. UAI 2004.24/07/2007 ISMB/ECCB 2007Low-dimensional latent representationProbabilistic generative model for genotype dataHigh-dimensional dataUnsupervised learning via maximum likelihood24/07/2007 ISMB/ECCB 2007A probabilistic model for genotype data24/07/2007 ISMB/ECCB 2007•Maximum likelihood:•Lower bound on log likelihood:Learning the model for genotype dataInferenceLearning/ Parameter estimation24/07/2007 ISMB/ECCB 2007•Exact inference is intractable!•Approximate the posterior distribution:•Baum-Welch-like algorithm:–Run forward-backward algorithm separately on each chain of states–Estimate transition probabilities and ancestral haplotypes given distributions over states Variational inference and parameter estimationQ(fmjk; sjk; tjkgNk=1; cj) =Q(x)Q(x)24/07/2007 ISMB/ECCB 2007Predicting missing genotype data•Have we learned a good density model for genotype data?•Gains from–Accounting for uncertainty in haplotype block structure–Accounting for uncertainty in ancestral haplotypes–Accounting for parental relationships•Assess model using cross-validation/test prediction error24/07/2007 ISMB/ECCB 2007Predicting missing genotype data•Crohn’s/5q31 data set (Daly et al., 2001)–Crohn’s disease data from Chromosome 5q31 containing genotypes for 129 children + 258 parents across 103 loci (phases given for children)•For each test set, make ρ fraction of data missing•Retain model parameters from model learned from training data, then draw 1000 samples over missing data•Compute fill-in error rate over 1000 samples, for all missing data24/07/2007 ISMB/ECCB 2007Prediction error for Crohn’s/5q31 data24/07/2007 ISMB/ECCB 2007Comparative performance for Crohn’s/5q31 data24/07/2007 ISMB/ECCB 2007Reconstructing phase•Run EM using 10 random initializations on the full data set•Estimate phase from posterior•Compute phase error over all loci where phase is known, unambiguous and where alleles are completely observed•Compute average and standard deviation of phase error over the 10 initializations24/07/2007 ISMB/ECCB 2007Reconstructing phaseDaly 5q31 data (children w/ phase) (phase frozen during EM)Daly 5q31 data (children w/out phase) (phase learned during EM):Daly 5q31 data (children w/ phase + parents) (phase frozen during EM)Daly 5q31 data (children w/out phase + parents) (phase learned during EM)Mean phase error rate0.59% 8.21% 0.39% 9.51%Standard deviation of phase error rate1.00% 1.09% 0.07% 1.78%Minimum free energy (nats)1.50 x 1042.23 x 1041.45 x 1041.36 x 10424/07/2007 ISMB/ECCB 2007How many ancestors?24/07/2007 ISMB/ECCB 2007Establishing haplotype block boundaries•Define the recombination prior γ on transition probabilities–Different γ correspond to different “blockiness” of data•For each locus k, can compute the probability of transition pk –Can establish a threshold t and establish block boundaries•Once blocks are defined, can assign block labels lb = (m,n)24/07/2007 ISMB/ECCB 2007Smaller number of larger blocks…Larger number of smaller blocks…Establishing haplotype block boundaries24/07/2007 ISMB/ECCB 2007Haplotype block structure in the ENm006 region•573 SNP markers for 270 individuals from 3 sub-populations:–90 Yoruba individuals (30 parent-parent-offspring trios) from


View Full Document

CMU CS 10701 - Recitation

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Recitation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Recitation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Recitation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?