DOC PREVIEW
Berkeley STATISTICS 246 - Multiple testing in large-scale gene expression experiments

This preview shows page 1-2-3-24-25-26-27-49-50-51 out of 51 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Multiple testing in large-scaleMultiple testing in large-scalegene expression experimentsgene expression experimentsStatistics 246, Spring 2002Week 8, Lecture 2Outline Outline• Motivation & examples• Univariate hypothesis testing• Multiple hypothesis testing• Results for the two examples• DiscussionSCIENTIFIC: To determine which genes are differentiallyexpressed between two sources of mRNA (trt, ctl).STATISTICAL: To assign appropriately adjusted p-values tothousands of genes, and/or make statements about falsediscovery rates.MotivationMotivation• 8 treatment mice and 8 control mice• 16 hybridizations: liver mRNA from each of the 16 mice(Ti , Ci ) is labelled with Cy5, while pooled liver mRNA fromthe control mice (C*) is labelled with Cy3.• Probes: ~ 6,000 cDNAs (genes), including 200 related tolipid metabolism.Goal. To identify genes with altered expression in the livers ofApo AI knock-out mice (T) compared to inbred C57Bl/6 controlmice (C).Apo Apo AI experiment AI experiment (Matt Callow, LBNL)(Matt Callow, LBNL)Golub Golub et alet al (1999) experiments (1999) experimentsGoal. To identify genes which are differentially expressed inacute lymphoblastic leukemia (ALL) tumours incomparison with acute myeloid leukemia (AML) tumours.• 38 tumour samples: 27 ALL, 11 AML.• Data from Affymetrix chips, some pre-processing.• Originally 6,817 genes; 3,051 after reduction.Data therefore a 3,051 × 38 array of expression values.DataDataThe gene expression data can be summarized as followstreatment controlX =Here xi,j is the (relative) expression value of gene i insample j. The first n1 columns are from the treatment (T);the remaining n2 = n - n1 columns are from the control (C).Univariate hypothesis testingUnivariate hypothesis testing Initially, focus on one gene only. We wish to test the null hypothesis H that the gene is notdifferentially expressed. In order to do so, we use a two sample t-statistic: taverof n trt x averof n ctl xnSDof n trt xnSDof n ctl x=−+1211211211[( ) ( )]pp-values-valuesThe p-value or observed significance level p is thechance of getting a test statistic as or more extreme thanthe observed one, under the null hypothesis H of nodifferential expression.Computing Computing pp-values by permutations-values by permutationsWe focus on one gene only. For the bth iteration, b = 1, ⋅⋅⋅ , B;1. Permute the n data points for the gene (x). The first n1 arereferred to as “treatments”, the second n2 as “controls”.2. For each gene, calculate the corresponding two samplet-statistic, tb.After all the B permutations are done;3. Put p = #{b: |tb| ≥ |tobserved|}/B (plower if we use >).With all permutations in the Apo AI data, B = n!/n1! n2! = 12,870;for the leukemia data, B = 1.2×109 .Many tests: a simulation studyMany tests: a simulation study Simulations of this process for 6,000 genes with8 treatments and 8 controls. All the gene expression values were simulatedi.i.d from a N (0,1) distribution,i.e. NOTHING is differentially expressed.(unadj.)valueindex1.7×10-3-3.8856941.6×10-3-3.9024271.6×10-33.9159301.4×10-3-3.9821647×10-4-4.2958987×10-4-4.3131567×10-44.3445214×10-4-4.6256223×10-44.8257092×10-44.932271p-valuetgeneUnadjusted p-valuesClearly we can’t just use standard p-value thresholds (.05, .01).Multiple hypothesis testing:Multiple hypothesis testing:Counting errorsCounting errors Assume we are testing H1, H2, ⋅⋅⋅, Hm . m0 = # of true hypotheses R = # of rejected hypothesesm-m0m0RSV# significantm - RTU# non-signif.null hypo.null hypo.# false# trueV = # Type I errors [false positives] T = # Type II errors [false negatives]Type I error ratesType I error rates• Per comparison error rate (PCER): the expected value of thenumber of Type I errors over the number of hypotheses, PCER = E(V)/m.•Per-family error rate (PFER): the expected number of Type I errors, PFE = E(V).•Family-wise error rate: the probability of at least one type I error FEWR = pr(V ≥ 1)•False discovery rate (FDR) is the expected proportion of Type I errorsamong the rejected hypotheses FDR = E(V/R; R>0) = E(V/R | R>0)pr(R>0).• Positive false discovery rate (pFDR): the rate that discoveries arefalse pFDR = E(V/R | R>0).Two types of control of Type I errorTwo types of control of Type I error• strong control: control of the Type I error whatever thetrue and false null hypotheses. For FWER, strong controlmeans controlling max pr(V ≥ 1 | M0)M0⊂H0Cwhere M0 = the set of true hypotheses (note |M0| = m0);• weak control: control of the Type I error only under thecomplete null hypothesis H0C = ∩iHi . For FWER, this iscontrol of pr( V ≥ 1 | H0C ).Adjustments to Adjustments to pp-values-values For strong control of the FWER at some level α, there areprocedures which will take m unadjusted p-values andmodify them separately, so-called single step procedures,the Bonferroni adjustment or correction being the simplestand most well known. Another is due to Sidák. Other, more powerful procedures, adjust sequentially, fromthe smallest to the largest, or vice versa. These are thestep-up and step-down methods, and we’ll meet anumber of these, usually variations on single-stepprocedures. In all cases, we’ll denote adjusted p-values by π, usuallywith subscripts, and let the context define what type ofadjustment has been made. Unadjusted p-values aredenoted by p.What should one look for in aWhat should one look for in a multiple testing procedure? multiple testing procedure? As we will see, there is a bewildering variety of multiple testingprocedures. How can we choose which to use? There is no simpleanswer here, but each can be judged according to a number ofcriteria:Interpretation: does the procedure answer a relevant question for you?Type of control: strong or weak?Validity: are the assumptions under which the procedure applies clearand definitely or plausibly true, or are they unclear and most probablynot true?Computability: are the procedure’s calculations straightforward tocalculate accurately, or is there possibly numerical or simulationuncertainty, or discreteness?pp-value adjustments: single-step-value adjustments: single-stepDefine adjusted p-values π such that the FWER is controlled atlevel α where Hi is rejected when πi ≤ α.•


View Full Document

Berkeley STATISTICS 246 - Multiple testing in large-scale gene expression experiments

Documents in this Course
Meiosis

Meiosis

46 pages

Meiosis

Meiosis

47 pages

Load more
Download Multiple testing in large-scale gene expression experiments
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiple testing in large-scale gene expression experiments and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiple testing in large-scale gene expression experiments 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?