DOC PREVIEW
Berkeley STATISTICS 246 - Multiple testing in large­-scale gene expression experiments

This preview shows page 1-2-3-24-25-26-27-49-50-51 out of 51 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Multiple testing in large-scale gene expression experimentsOutlineMotivationApo AI experiment (Matt Callow, LBNL)PowerPoint PresentationGolub et al (1999) experimentsDataUnivariate hypothesis testingp-valuesComputing p-values by permutationsMany tests: a simulation studySlide 12Multiple hypothesis testing: Counting errorsType I error ratesTwo types of control of Type I errorAdjustments to p-valuesWhat should one look for in a multiple testing procedure?p-value adjustments: single-stepProof for Bonferroni (single-step adjustment)Proof for Sidák’s method (single-step adjustment)Single-step adjustments (ctd)Proof for (single-step) minP adjustmentStrong control and subset pivotalityPermutation-based single-step minP adjustment of p-valuesThe computing challenge: iterated permutationsAvoiding the computational difficulty of single-step minP adjustmentProof for the single-step maxT adjustmentMore powerful methods: step-down adjustmentsS Holm’s modification of BonferroniStep-down adjustment of minPProof for step-down minP adjustmentFalse discovery rate (Benjamini and Hochberg 1995)False discovery rate (Benjamini and Yekutieli 2001)Positive false discovery rate (Storey, 2001, independent case)Positive false discovery rate ( Storey, 2001, dependent case)ResultsSlide 37Slide 38Slide 39The gene namesSlide 41Slide 42Slide 43Slide 44Slide 45Some pFDR figuresDiscussionDiscussion, ctd.Selected referencesAcknowledgementsSoftwareMultiple testing in large-scale Multiple testing in large-scale gene expression experiments gene expression experiments Statistics 246, Spring 2002Week 8, Lecture 2OutlineOutline•Motivation & examples•Univariate hypothesis testing• Multiple hypothesis testing• Results for the two examples• DiscussionSCIENTIFIC: To determine which genes are differentially expressed between two sources of mRNA (trt, ctl).STATISTICAL: To assign appropriately adjusted p-values to thousands of genes, and/or make statements about false discovery rates.MotivationMotivation• 8 treatment mice and 8 control mice• 16 hybridizations: liver mRNA from each of the 16 mice (Ti , Ci ) is labelled with Cy5, while pooled liver mRNA from the control mice (C*) is labelled with Cy3.• Probes: ~ 6,000 cDNAs (genes), including 200 related to lipid metabolism.Goal. To identify genes with altered expression in the livers of Apo AI knock-out mice (T) compared to inbred C57Bl/6 control mice (C).Apo AI experiment Apo AI experiment (Matt Callow, LBNL)(Matt Callow, LBNL)Golub Golub et alet al (1999) experiments (1999) experimentsGoal. To identify genes which are differentially expressed in acute lymphoblastic leukemia (ALL) tumours in comparison with acute myeloid leukemia (AML) tumours.• 38 tumour samples: 27 ALL, 11 AML.• Data from Affymetrix chips, some pre-processing.• Originally 6,817 genes; 3,051 after reduction.Data therefore a 3,051  38 array of expression values.DataDataThe gene expression data can be summarized as followstreatment controlX =Here xi,j is the (relative) expression value of gene i in sample j. The first n1 columns are from the treatment (T); the remaining n2 = n - n1 columns are from the control (C).Univariate hypothesis testingUnivariate hypothesis testing Initially, focus on one gene only. We wish to test the null hypothesis H that the gene is not differentially expressed. In order to do so, we use a two sample t-statistic:€ t =averof n1trtx − averof n2ctlx[1n1(SDof n1trtx)2+1n1(SDof n1ctlx)2]pp-values-valuesThe p-value or observed significance level p is the chance of getting a test statistic as or more extreme than the observed one, under the null hypothesis H of no differential expression.Computing Computing pp-values by permutations-values by permutationsWe focus on one gene only. For the bth iteration, b = 1,  , B;1. Permute the n data points for the gene (x). The first n1 are referred to as “treatments”, the second n2 as “controls”.2. For each gene, calculate the corresponding two sample t-statistic, tb.After all the B permutations are done;3. Put p = #{b: |tb| ≥ |tobserved|}/B (plower if we use >).With all permutations in the Apo AI data, B = n!/n1! n2! = 12,870;for the leukemia data, B = 1.2109 .Many tests: a simulation studyMany tests: a simulation study Simulations of this process for 6,000 genes with 8 treatments and 8 controls. All the gene expression values were simulated i.i.d from a N (0,1) distribution, i.e. NOTHING is differentially expressed.gene t p-valueindex value (unadj.)2271 4.93 210-45709 4.82 310-45622 -4.62 410-44521 4.34 710-43156 -4.31 710-45898 -4.29 710-42164 -3.98 1.410-35930 3.91 1.610-32427 -3.90 1.610-35694 -3.88 1.710-3Unadjusted p-valuesClearly we can’t just use standard p-value thresholds (.05, .01).Multiple hypothesis testing: Multiple hypothesis testing: Counting errorsCounting errors Assume we are testing H1, H2, , Hm . m0 = # of true hypotheses R = # of rejected hypotheses # true # falsenull hypo. null hypo.# non-signif.U T m - R# significantV S Rm0m-m0V = # Type I errors [false positives] T = # Type II errors [false negatives]Type I error ratesType I error rates• Per comparison error rate (PCER): the expected value of the number of Type I errors over the number of hypotheses, PCER = E(V)/m.•Per-family error rate (PFER): the expected number of Type I errors, PFE = E(V).•Family-wise error rate: the probability of at least one type I error FEWR = pr(V ≥ 1)•False discovery rate (FDR) is the expected proportion of Type I errors among the rejected hypotheses FDR = E(V/R; R>0) = E(V/R | R>0)pr(R>0). • Positive false discovery rate (pFDR): the rate that discoveries are false pFDR = E(V/R | R>0).Two types of control of Type I errorTwo types of control of Type I error• strong control: control of the Type I error whatever the true and false null hypotheses. For FWER, strong control means controlling max pr(V ≥ 1 | M0)M0H0Cwhere M0 = the set of true hypotheses (note |M0| = m0);• weak control: control of the Type I error only under the complete null hypothesis H0C = iHi . For FWER, this is control of pr( V ≥ 1 | H0C ).Adjustments to Adjustments to pp-values-values For strong control of the FWER at some level , there are procedures which will take m unadjusted p-values and modify them separately,


View Full Document

Berkeley STATISTICS 246 - Multiple testing in large­-scale gene expression experiments

Documents in this Course
Meiosis

Meiosis

46 pages

Meiosis

Meiosis

47 pages

Load more
Download Multiple testing in large­-scale gene expression experiments
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiple testing in large­-scale gene expression experiments and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiple testing in large­-scale gene expression experiments 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?