Summarizing many probe intensities Statistics 246 Spring 2006 Week 9 Lecture 2 1 Summarizing 11 20 probe intensity pairs to give a measure of expression for a probe set There are many low level summaries in use Here are a few MAS5 0 the Affymetrix method which replaced AvDiff described previously dChip described previously RMA Robust Multi chip Analysis to be described now GCRMA the successor to RMA and PLIER the successor to MAS5 0 2 The MAS5 0 expression summary Log Signal Intensity TukeyBiweight log PMj MMj where MMj a version of MMj that is never bigger than PMj the details are not important and TukeyBiweight is a robust resistant average of the 11 20 quantities Exercise Tukey s biweight location estimator is an Mestimator What is its form There are at least 2 problems with this summary First it is still single chip so cannot benefit from information across chips And second the robustness is not necessarily in the correct place A probe may function perfectly well but have much higher intensities than the other probes This procedure would downweight perhaps ignore it for no good reason 3 Some influence functions courtesy of Philip Stark 4 Some weight functions courtesy of Philip Stark 5 What RMA does four steps It uses only the PMs and ignore the MMs Also it Adjusts for background call this BG Carries out quantile normalization of PM BG with chips in suitable sets call the result n PM BG Takes log2 of normalized background adjusted PM Carries out a Robust Multi chip linear model Analysis RMA of the quantities log2n PM BG 6 Why ignore the MM values The reason was that it was hard to do better using them They definitely have information about both them without adding signal and noise but using more noise see below seemed to be a challenge GCRMA and other improved bg correction methods make use of MMs though this hardly justifies having one MM for every PM A pool of control probes for use in estimating non specific hybridization would do just as well and this is now being used on some Affymetrix chips 7 Why take log2 Look at SDs from replicate chips 8 Why Multi chip Analysis To put each chip s values in the context of a set of similar values We see that there are substantial probe effects in the responses next slides and that these are repeatable Such an obvious feature the parallelism of probe response across chips in these figures should be exploited in any analysis This leads to an additive model on the log scale or a multiplicative model on the intensity scale as in dChip 9 Probe Intensity vs conc ex 1 A glance at some raw data 20 probe spike in set across 14 arrays PM intensity Concentration in pM array 10 Probe 16 Intensity vs conc ex 2 probe spike in set 6 non responding 11 Why write log2n PM BG chip effect probe effect Because probe effects are additive on the log scale The spike in data set in the previous two slides showed it Indeed every set of experiments should exhibit this parallel behaviour across probes Can you see why 12 Why a Robust rather than Least Squares fit In these large bodies of data we see many perhaps up to 10 outliers image artifacts bad probes bad chips Robust summaries really can improve over the standard ones by down weighting outliers and leaving their effects visible in residuals Not only are the estimates of quantities that matter better we can use aspects of the robust analysis to carry out quality assessment 13 How Robust Multi chip Analysis works The analysis involves robustly fitting the following linear model for one probe set log2 n PMij BG ai bj ij where i labels chips j labels probes and bj 0 is imposed for identifiability Here ij are the errors assumed iid with mean zero constant variance 2 Initially we used median polish but the current implementation uses Huber s We ll describe both 14 RMA in summary Background correct PM Carry out quantile normalization Take log2 Under the additive model log2 n PMij BG ai bj ij Estimate chip effects ai and probe effects bj with bj 0 using a robust resistant method 15 M estimators One can estimate the parameters of the model as solutions to Yij ai b j 2 2 min min uij a i b j a i b j i j i j where is a symmetric positive definite function increasing less rapidly than x2 Solutions to this minimization problem can be obtained by an Iteratively Reweighted Least Squares IRWLS procedure with weights w ij uij uij uij uij 16 Robust fit by IRWLS At each iteration rij Yij current est ai current est bj S k MAD rij a robust estimate of the scale parameter uij rij S standardized residuals wij uij uij weights to reduce the effect of discrepant points on the next fit Next step estimates are nextest ai weighted row i mean nextest bj weighted col j mean overall weighted mean 17 The way robustness works here We look at the parameter estimates from a robust twoway analysis using median polish and the usual least squares on data from 37 control probe sets across 6 chips Overall we have 20x37 probe effects 6x37 chip effects and 20x6x38 residuals This was repeated for 37 randomly chosen probe sets to check that the control probe sets were not atypical They weren t After observing certain patterns an explanation is offered of the way robustness works 18 The two methods Each fits the same additive model to probe level data Median polish mp fits iteratively successively removing row and column medians and accumulating the terms until the process stabilizes The residuals are what is left at the end Least squares lm uses the familiar closed form estimates of the parameters a and b what are they and again the residuals are what is left after subtracting them from the observations 19 Two sets of residuals control probes 20 Note the slightly different shapes of their distributions Similarly with random probe sets 21 Normal qq plots of mp and lm residuals Which has slightly fatter tails 22 mp residuals have fatter tails than lm but smaller IQRs 23 Chip effects 37 control probe sets 6 chips 24 mp chip effects somewhat more concentrated than lm s Chip effects 37 random probe sets 6 chips The same holds with random probe sets 25 Chip effects control probe sets left random right median polish mp first linear model lm second lm has more outlier chip effects than mp and larger 26 IQR Probe effects 37x20 control probes 27 Median polish probe effects also more concentrated Probe effects 37x20 random probes And again with random probe sets 28 Probe effects control probe sets left random right median polish mp first linear model lm second
View Full Document