DOC PREVIEW
Berkeley STATISTICS 246 - Identifying expression differences

This preview shows page 1-2-15-16-31-32 out of 32 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

An empirical Bayes storyPowerPoint PresentationB=LOR compared with t and M.Extensions include dealing withSummary (for the second simplest problem)Use of linear models with cDNA microarray dataAdvantages of linear modelsLog-ratios or single channel intensities?Linear models for differential expressionMatrix multiplicationSlightly larger example:Linear model estimatesParallel inference for genesHierarchical modelPosterior statisticsPosterior OddsSummaryAppendix: slightly more general theoretical developmentSlide 20From single genes to sets of genesCluster AnalysisSlide 23Discovering sub-groupsLimitationsA synthesisOne approach: clustering genesData - Ro1HistogramTop 15 averages of gene clustersRemarksAcknowledgments1Lecture 21, Statistics 246,April 8, 2004 Identifying expression differences in cDNA microarray experiments, cont.2An empirical Bayes story Suppose that our M values are independently and normally distributed, and that a proportion p of genes are differentially expressed, i.e. have M’s with non-zero means. Further, suppose that the variances and means of these are chosen jointly from inverse chi-square and normal conjugate priors, respectively. Genes not differentially expressed have zero means, and variances chosen from the same inverse chi-squared distribution. The scale and d.f. parameters in the inverse chi-square are estimated from the data, as is a parameter c connecting the prior for the mean with that for the variances. We then calculate for the posterior probability that a given gene is differentially expressed, and find it is an increasing function of B over the page, where a and c are estimated parameters, and p is in the constant.3B =const+log2an+s2+M•22an+s2+M•21+nc⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ Empirical Bayes log posterior odds ratio (LOR)Notice that for large n this approximately t=M./s .4B=LOR compared with t and M.5Extensions include dealing with•Replicates within and between slides•Several effects: use a linear model•ANOVA: are the effects equal?•Time series: selecting genes for trends6Summary (for the second simplest problem)•Microarray experiments typically have thousands of genes, but only few (1-10) replicates for each gene.•Averages can be driven by outliers.•t-statistics can be driven by tiny variances.•B = LOR will, we hope–use information from all the genes–combine the best of M. and t–avoid the problems of M. and tRanking on B could be helpful.7Use of linear models with cDNA microarray data In many situations we want to combine data from different experiments in a slightly more elaborate manner than simply averaging. One way of doing so is via (fixed effects) linear models, where we estimate certain quantities of interest which we call effects for each gene on our slide. Typically these estimates may be regarded as approximately normally distributed with common SD, and mean zero in the absence of any relevant differential expression. In such cases, the preceding two strategies: qq-plots, and various combinations of estimated effect (cf M.), standardized estimate (cf. t) both apply.8Advantages of linear models•Analyse all arrays together combining information in optimal way•Combined estimation of precision•Extensible to arbitrarily complicated experiments•Design matrix: specifies RNA targets used on arrays•Contrast matrix: specifies which comparisons are of interest9Log-ratios or single channel intensities?•Traditional analyses, as here, treat log-ratios M=log(R/G) as the primary data, i.e., we take gene expression measurements as relative•An alternative approach treats individual channel intensities R and G as the primary data, i.e., views gene expression measures as “absolute” (Wolfinger, Churchill, Kerr)•A single channel approach makes new analyses possible but it–make stronger assumptions–requires more complex models (mixed models in place of ordinary linear models) to accommodate correlation between R and G on same spot–requires absolute normalization methods10Linear models for differential expressionA BRefABA BA BCAllows all comparisons to be estimated simultaneously11Matrix multiplicationA BRefABA BC12312⎟⎟⎟⎟⎟⎟⎠⎞⎜⎜⎜⎜⎜⎜⎝⎛⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣⎡=⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠⎞⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝⎛bababaayyyyyyy2121765432111100100100001001100010010000100100 WT.P11  + a1 MT.P21 + (a1 + a2) + b + (a1 + a2)b MT.P11 +a1+b+a1.b WT.P21 + a1 + a2WT.P1  MT.P1  + b1234567Slightly larger example:13Linear model estimatesObtain a linear model for each gene gEstimate model by robust regression,least squares or generalized least squaresto getcoefficientsstandard deviationsstandard errors14Parallel inference for genes•10,000-40,000 linear models•Curse of dimensionality:Need to adjust for multiple testing, e.g., control family-wise error rate (FWE) or false discovery rate (FDR)•Boon of parallelism:Can borrow information from one gene to another15Hierarchical modelNormal ModelPriorNormality, independence assumptions are wrong but convenient, resulting methods are useful16Posterior statisticsModerated t-statisticsPosterior variance estimatorsEliminates large t -statistics merely from very small s.under null17Posterior OddsPosterior probability of differential expression for any gene isGeneralization of B mentioned earlier. Monotonic function of for constant d.Exercise. Prove all the distributional statements made above.18Summary•Analyse data all at once•Use standard deviances not just fold changes•Use ensemble information to shrink variances•Assess differential expression for all comparisons together19Appendix: slightly more general theoretical developmentAssume that for each gene, Also, assume that certain contrasts g = CT g are of interest. The estimators of these contrasts and their estimated covariance matrices areIf we let vgj be the jth diagonal element of CTVgC, then our assumptions arewhere dg is the residual degrees of freedom in the linear model for gene g.€ E(yg) = Xαg; var( yg) = Wgσg2. € ) β g= CT) α g, var() β g) = CTVgCσg2. € ) β gj| βgj,σg2~ N(βgj,vgjσg2), sg2|σg2~σg2dgχdg2,20Hierarchical modelAs before, we define a simple hierarchical model to combine information across the genes. Prior information on the variances is assumed to come via a prior estimate 2 with d.f. :For any given j,


View Full Document

Berkeley STATISTICS 246 - Identifying expression differences

Documents in this Course
Meiosis

Meiosis

46 pages

Meiosis

Meiosis

47 pages

Load more
Download Identifying expression differences
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Identifying expression differences and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Identifying expression differences 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?