Unformatted text preview:

OutlineEffect of cigarette smoke on gene expression The Multiple Comparisons Problem The BH Procedure for FDR control at level qThe Units of Testing in Microarray StudiesTypical Gene Set AnalysisData of Spira04Proposed Extension to Gene Set AnalysisThe Overall FDR (OFDR) A General Procedure for OFDR control GS-OFDR procedure for study of Spira04 Table of ResultsInterpretation of ResultsOutline Does the Treatment Alter the Response? Causal Inference The cross-match test comparing 2 multivariate distributions Application to the Test of the Global Null Conclusion from Test of the Global NullSensitivity to hidden biasModel for Sensitivity AnalysisThe Distribution of Treatment AssignmentSensitivity Analysis for the Cross-match TestApplication to the test of the global null Sensitivity analysis for gene set analysisApplication to gene-set analysis at =10 The pairing in one of the least sensitive gene setsSummaryRuth Heller, 29/04/2009 Gene Set Analysis of Observational Microarray Studies -p.1Gene Set Analysis of ObservationalMicroarray StudiesRuth HellerDepartment of Statistics, University of PennsylvaniaJoint work with:Warren Ewens, Greg Grant, Elisabetta Manduchi,Shane Jensen, Paul Rosenbaum, Dylan SmallRuth Heller, 29/04/2009 Gene Set Analysis of Observational Microarray Studies -p.2Outline■Gene set analysis in microarray studies.◆We suggest first testing gene sets, then individual genes within discoveredgene sets.◆We introduce the overall FDR (OFDR) as the appropriate error measure tocontrol.■Causal inference in observational microarray studies.◆We introduce the cross-match test, and a sensitivity analysis.■Application to the study by Spira et al. (2004) of the effect of smoking ongene expression levels.Ruth Heller, 29/04/2009 Gene Set Analysis of Observational Microarray Studies -p.3Effect of cigarette smoke on gene expressionAn observational study by Spira et al. (2004)The Data:■33 smokers, 23 non-smokers.■Human epithelial cells from brushings of the right main bronchus proximal tothe right upper lobe of the lung.■9968 expression profiles from HG-U133A Affymetrics chip.The Goal:“To define how cigarrette smokingalters the transcriptome".Ruth Heller, 29/04/2009 Gene Set Analysis of Observational Microarray Studies -p.4The Multiple Comparisons Problem■Which of my10000 genes are differentially expressed across groups?■If every test is controlled for type I error at0.05 level and all hypotheses arenull we expect to declare500 genes as differentially expressed!■How to control for false positives?◆Control the probability of at least one false positive (FWER, e.g. Bonferronicorrection).◆Control the expected proportion of false positives(FDR ,Benjamini and Hochberg (1995)).The more relevant error rate when a large number of hypotheses aretested, if we can tolerate few false discoveries as long as the proportion offalse discoveries among discoveries is small.Ruth Heller, 29/04/2009 Gene Set Analysis of Observational Microarray Studies -p.5The BH Procedure for FDR control at level qFor S hypotheses with associated p-values p1, . . . , pS:Order the p-valuesp(1)≤ . . . ≤ p(j)≤ . . . ≤ p(S).k = max{j : p(j)≤jSq}Reject the hypotheses corresponding to the smallest k p-values.Ruth Heller, 29/04/2009 Gene Set Analysis of Observational Microarray Studies -p.6The Units of Testing in Microarray Studies■The analysis units of interest expanded from individual genes to collectionsof genes, called gene sets.■Gene sets are typically defined by prior knowledge about biologicalprocesses or molecular functions.■Advantages of testing gene sets over testing single genes:1. Increased signal to noise ratio.2. Reduced number of hypotheses tests.3. Discoveries are biologically meaningful.Ruth Heller, 29/04/2009 Gene Set Analysis of Observational Microarray Studies -p.7Typical Gene Set AnalysisMethods that first compute the individual gene test-statistics:■Test of over-representation (e.g. EASE).identify the list of significant genes; test for over-representation of genes in this list among the genes in thegene set.■Gene set enrichment analysis (GSEA, Subramanian et al. (2005)).rank the genes based on their test statistic; calculate an enrichment score that reflects the degree in which theranks of the genes in the gene set is over-represented at the extremes of the ranked list. Calculated the geneset p-value using subject permutations.An approach that starts from raw expression data:■Test whether the joint distribution of the expression levels of genes in thegene set is the same across groups (Goeman et al. (2004), Nettleton et al.(2008)).We use the cross-match test (Rosenbaum (2005b)).Each of these methods produces a gene set p-value,p(s). The BH procedurecan be applied on{p(s) : s = 1, . . . , S}.Ruth Heller, 29/04/2009 Gene Set Analysis of Observational Microarray Studies -p.8Data of Spira et al. (2004)1627 (overlapping) molecular function categories, 9968 expression profiles.Gene set ID Gene ID Smkr1 . . . Smkr33 Non-smkr1 . . . Non-smkr23(size)GO:0004033 AKR1B1 6.1 . . . 6.0 5.5 . . . 5.2(13) ALDH3A1 8.6 . . . 9.1 7.0 . . . 5.9. . . . . . . . . . . . . . . . . . . . .GO:0016620 OGDH 5.1 . . . 4.8 4.6 . . . 5.1(21) ALDH3A1 8.6 . . . 9.1 7.0 . . . 5.9. . . . . . . . . . . . . . . . . . . . .Gene set analysis: Get the cross-match test p-value for each gene set; applythe BH procedure at the levelq = 0.05.Results: the null hypothesis of no difference between smokers andnon-smokers is rejected for83 molecular function categories.But the gene set discoveries may not be meaningful biologically: GO:0016620may be discovered because GO:0004033 is differentially expressed.Ruth Heller, 29/04/2009 Gene Set Analysis of Observational Microarray Studies -p.9Proposed Extension to Gene Set AnalysisTest gene sets, then genes within discovered gene sets (Heller et al. (2009)).■The testing strategy for onegene set:1. Test the gene set null hypothesis, called the screening hypothesis.2. If the screening hypothesis is rejected, test the individual gene hypotheseswith a procedure for FWER control.If both steps are performed at level 0.05, the FWER is controlled at level 0.05.■The testing strategy for manygene sets s = 1, . . . , S:1. On the screening hypotheses p-values, apply the BH procedure at level q.2. For theR gene sets with rejected screening hypotheses, test the individualgene hypotheses with FWER control.If performed step 1 at level q = 0.05, step 2 at FWER


View Full Document

Bloomberg School BIO 751 - Microarray Studies

Download Microarray Studies
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Microarray Studies and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Microarray Studies 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?