This preview shows page 1-2-3-4-5-6-7-8-52-53-54-55-56-57-58-106-107-108-109-110-111-112-113 out of 113 pages.
Background models and GCRMA Statistics 246 Week 10 Spring 2006 Lecture 1 1 Context Background with Affymetrix probe set summaries We resume discussing the question of reducing a set of probe pair intensities PMi MMi i 1 11 or 16 or 20 into a single probe set summary and as is usual we are dealing with a set of Affymetrix chips together Our concern here is background interpreting the term broadly We take to this be an issue because of the flattening of the response at the low and the high ends of the plot in the next slide together with the fact noted last week that probe intensities for transcripts known to be absent are still far from zero The earliest background adjustment was the subtraction MMi from PMi used in the early AvDiff This caused the obvious problems and was natural to want to do better Our initial RMA style background was a crude effort to do better but ignored the MMs It was better in many but not all ways We need to do better still 2 From the Affymetrix U95A spike in data set 3 Preliminaries There have been quite a few attempts to use probe sequence information to predict non specific binding of target to probes in the hope of improving probe set summaries Most have been deterministic based on theoretical chemical biophysical considerations and fitted to data None of these has yet to perform well in the field We will take an empirical statistical approach modifying our background model to take into account probe sequence We begin by showing that probe sequence matters by looking at the impact of a given base at a given position in the probe sequence 4 Summary plots for 3 data sets First we display boxplots of PM and MM intensities for chips in this experiment stratified by base and position Then these data are reorganized according to the number of bases of a given kind in the probe sequence In these plots the chip data are aggregated Then we examine the effects of replacing a particular base at a particular position by a different base at that position We plot the difference between the mean intensity for all probes with that base at that position and the mean for all probes having a different base at that position These plots are displayed for different chips in different colors After these we give the corresponding intercept terms Finally we plot the numbers of probes with a given base at a given position 5 Affymetrix U95A spike in 6 7 8 9 PMs 10 11 MMs 12 13 PMs 14 MMs 15 Affymetrix U133A spike in 16 17 18 19 PMs 20 21 MMs 22 23 24 25 Drosophila time course data 26 27 28 29 PMs 30 31 MMs 32 33 34 35 Some initial conclusions The base at any position in a PM probe affects the probe s intensity The overall base composition of a PM probe affects the probe s intensity with the numbers of As and Cs having the most effect The base and position differences we see are quite repeatable across chips of the same type in a single experiment While broadly similar these patterns differ in detail across chip types What we see for PM probes is equally true for MM probes 36 Towards a model incorporating the preceding observations We now consider an affinity model for each probe taking the following additive form k 1 25 b A C G T b k I bk b where b k fb k has the form of a spline with 5 d f and I bk b takes the value 1 when the base at position k is b and 0 otherwise This follows Magnesco Naef 2003 This empirical linear statistical model contrasts dramatically with the biophysical models which are typically nonlinear Next we present the results of fitting this model to different chips of different types Plotted are ther coefficients b k 37 U95A spike in 38 39 A few U95A chips from the St Jude set 40 41 42 U133A spike in 43 44 Drosophila time course 45 46 Tentative conclusions The coefficients in the empirical linear probe affinity model have fairly similar patterns within chip set The pattern of coefficients is broadly similar but differs in detail across chip sets of the same type and across chip types 47 Next steps If we are to go with this empirical probe affinity model several questions need to be answered a Which probes should we use to fit the model b How frequently should we refit the affinity model c How should we use the model to adjust probe intensities for probe affinity Having done all this we will naturally ask d How well does it work We deal briefly with each of these questions in turn 48 Which probes should we use Where available it is natural to use the MM probes As we have seen they exhibit essentially the same behaviour as the PMs yet are separate from them Or we could use all PMs and MMs On chips without MM probes there are typically a broad range of control probes for example probes consisting of random DNA sequences not found in the genome under study These can be used to model the sequence specific probe affinity 49 How frequently should we refit the probe affinity model One extreme implemented in GCRMA is to fit the model to one data set once and for all and use that model unchanged on all subsequent data sets The differences we saw above between sets of chips of the same type and different chip types suggests that refitting under certain circumstances might be more accurate The question of how frequently has not been thoroughly explored though work is under way 50 How should we use the affinity model Now we describe the current GCRMA method of adjusting for probe affinity This uses the model PM O N S MM O N where O and O are optical noise N and N are non specific binding contributions to the intensities and S is the signal of interest In GCRMA O and O are assumed to be normally distributed with a means O and variance O2 logS is assumed to be exponential and the pair logN logN is assumed to be bivariate normally distributed with means and which depend on probe affinity see next slide equal variances N2 and correlation 0 7 across all probes 51 In more detail First it is argued that the variance O2 of the optical noise term is so small compared with N2 this term can be taken as effectively constant Specifically what is used is min minj PMj minj MMj 1 Then the probe intensities are adjusted for optical noise by forming PM j PMj and MM j MMj Next a loess line is fitted through all the pairs estimated affinity logMM The negative residuals from this line are used to estimate N2 Specifically an appropriately scaled MAD is used Finally the pair are estimated by fitting to the loess line 52 using the estimated affinity of PM Details completed Under …
View Full Document