DOC PREVIEW
Berkeley STATISTICS 246 - Background models and GCRMA

This preview shows page 1-2-3-4-5-6-7-8-52-53-54-55-56-57-58-106-107-108-109-110-111-112-113 out of 113 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 113 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Background models and GCRMAStatistics 246 Spring 2006Week 10 Lecture 12Context: Background with Affymetrixprobe set summaries We resume discussing the question of reducing a set of probe pairintensities (PMi ,MMi), i=1, …11 (or 16 or 20) into a single probeset summary, and, as is usual, we are dealing with a set ofAffymetrix chips together. Our concern here is background,interpreting the term broadly. We take to this be an issue becauseof the flattening of the response at the low and the high ends ofthe plot in the next slide, together with the fact noted last weekthat probe intensities for transcripts known to be absent are stillfar from zero. The earliest background adjustment was the subtraction MMifrom PMi used in the early AvDiff. This caused the obviousproblems, and was natural to want to do better. Our initial RMAstyle background was a crude effort to do better, but ignored theMMs. It was better in many but not all ways. We need to dobetter still.3From the Affymetrix U95A spike-in data set.4Preliminaries There have been quite a few attempts to use probesequence information to predict non-specific binding oftarget to probes, in the hope of improving probe setsummaries. Most have been deterministic, based ontheoretical chemical/biophysical considerations, andfitted to data. None of these has yet to perform well inthe field. We will take an empirical statistical approach,modifying our background model to take into accountprobe sequence. We begin by showing that probe sequence matters, bylooking at the impact of a given base at a given positionin the probe sequence.5Summary plots for 3 data setsFirst we display boxplots of PM and MM intensities for chips inthis experiment, stratified by base and position. Then thesedata are reorganized according to the number of bases of agiven kind in the probe sequence. In these plots, the chipdata are aggregated.Then we examine the effects of replacing a particular base at aparticular position by a different base at that position. Weplot the difference between the mean intensity for all probeswith that base at that position and the mean for all probeshaving a different base at that position. These plots aredisplayed for different chips in different colors. After these wegive the corresponding intercept terms. Finally, we plot thenumbers of probes with a given base at a given position.6Affymetrix U95A spike-in78910PMs1112MMs1314PMs15MMs16Affymetrix U133A spike-in17181920PMs2122MMs23242526Drosophila time course data27282930PMs3132MMs33343536Some initial conclusionsThe base at any position in a PM probe affects theprobe’s intensity.The overall base composition of a PM probe affects theprobe’s intensity, with the numbers of As and Cshaving the most effect.The base and position differences we see are quiterepeatable across chips of the same type in a singleexperiment. While broadly similar, these patternsdiffer in detail across chip types.What we see for PM probes is equally true for MMprobes.37Towards a model incorporating thepreceding observationsWe now consider an affinity model for each probe, taking thefollowing additive form ∑k=1,..,25 ∑b∈{A,C,G,T} µb,k I(bk = b)where µb,k = fb(k) has the form of a spline with 5 d.f, and I(bk = b)takes the value 1 when the base at position k is b, and 0otherwise. This follows Magnesco & Naef (2003).This empirical linear statistical model contrasts dramaticallywith the biophysical models, which are typically nonlinear.Next we present the results of fitting this model to differentchips of different types. Plotted are ther coefficients µb,k .38U95A spike-in3940A few U95A chips from the St Jude set414243U133A spike-in4445Drosophila time-course4647Tentative conclusionsThe coefficients in the empirical linear probeaffinity model have fairly similar patternswithin chip set.The pattern of coefficients is broadly similar, butdiffers in detail across chip sets of the sametype, and across chip types.48Next stepsIf we are to go with this empirical probe affinity model,several questions need to be answered:a) Which probes should we use to fit the model?b) How frequently should we refit the affinity model?c) How should we use the model to adjust probeintensities for probe affinity?Having done all this, we will naturally askd) How well does it work?We deal briefly with each of these questions in turn.49Which probes should we use?Where available, it is natural to use the MM probes. As wehave seen, they exhibit essentially the same behaviour asthe PMs, yet are separate from them. Or, we could use allPMs and MMs. On chips without MM probes, there are typically a broadrange of control probes, for example, probes consisting ofrandom DNA sequences not found in the genome understudy. These can be used to model the sequence specificprobe affinity.50How frequently should we refit theprobe affinity model?One extreme, implemented in GCRMA, is to fit themodel to one data set, once and for all, and use thatmodel unchanged on all subsequent data sets.The differences we saw above, between sets of chips ofthe same type, and different chip types, suggests thatrefitting under certain circumstances might be moreaccurate. The question of how frequently has notbeen thoroughly explored, though work is under way.51How should we use the affinity model? Now we describe the current GCRMA method of adjusting forprobe affinity. This uses the model PM = O + N + S MM = O’ + N’, where O and O’ are optical noise, N and N’ are non-specificbinding contributions to the intensities, and S is the signal ofinterest. In GCRMA, O and O’ are assumed to be normallydistributed, with a means µO and variance σO2; logS isassumed to be exponential, and the pair (logN, logN’) isassumed to be bivariate normally distributed with means µand µ’ which depend on probe affinity (see next slide), equalvariances σN2 and correlation ρ= 0.7 across all probes.52In more detail First, it is argued that the variance σO2 of the optical noiseterm is so small compared with σN2 , this term can be taken aseffectively constant, Specifically, what is used is µ* = min{minj PMj, minj MMj} -1. Then the probe intensities are adjusted for optical noise byforming PM*j = PMj - µ* and MM*j = MMj - µ* . Next, a loess line is fitted through all the pairs (estimatedaffinity, logMM). The negative residuals from this line areused to estimate σN2 . Specifically, an appropriately scaledMAD is used. Finally, the pair (µ,


View Full Document

Berkeley STATISTICS 246 - Background models and GCRMA

Documents in this Course
Meiosis

Meiosis

46 pages

Meiosis

Meiosis

47 pages

Load more
Download Background models and GCRMA
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Background models and GCRMA and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Background models and GCRMA 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?