DOC PREVIEW
Berkeley STATISTICS 246 - Introduction to Affymetrix GeneChip data

This preview shows page 1-2-3-4-5-33-34-35-36-67-68-69-70-71 out of 71 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Introduction to Affymetrix GeneChip dataStat 246, Spring 2002, Week 16Summary• Review of technology• Probeset summaries• What we do: our 4 steps• Assessing the technology and thedifferent expression measures• How robustness worksProbe arrays2424µµmmMillions of copies of a specificMillions of copies of a specificoligonucleotideoligonucleotide probe probe Image of Hybridized Probe ArrayImage of Hybridized Probe Array>200,000 different>200,000 differentcomplementary probes complementary probes Single stranded, Single stranded, labeled RNA targetlabeled RNA targetOligonucleotideOligonucleotide probe probe*****1.28cm1.28cmGeneChipGeneChip Probe ArrayProbe ArrayHybridized Probe CellHybridized Probe CellCompliments of D. GerholdImage analysis• About 100 pixels perprobe cell• These intensitiesare combined toform one numberrepresentingexpression for theprobe cell oligo• Possibly room forimprovementPM MMThe big picture• Summarize 20 PM,MM pairs (probelevel data) into one number for eachprobe set (gene)• We call this number an expressionmeasure• Affymetrix GeneChip Software hasdefaults.• Does it work? Can it be improved?Where is the evidence that it works? Lockhart et. al. Nature Biotechnology 14 (1996)Comments• The chips used in Lockhart et. al. containedaround 1000 probes per gene• Current chips contain 11-20 probes per gene• These are quite different situations• We haven’t seen a plot like the previous onefor current chipsSome possible problemsWhat if• a small number of the probe pairs hybridize much betterthan the rest?• removing the middle base does not make a differencefor some probes?• some MMs are PMs for some other gene?• there is need for normalization?We explore these possibilities using a variety of data setsSD vs. Avg (across replicate chips)ANOVA: Strong probe effect:5 times bigger than gene effectCompeting measures of expression• GeneChip_ older software uses Avg.diffwith A a set of suitable pairs chosen bysoftware. 30%-40-% can be <0.•Log PMj/MMj was also used.• For differential expression Avg.diffs arecompared between chips.∑Α∈−Α=jjjMMPMdiffAvg )(1.Competing measures of expression, 2• Li and Wong fit a model They consider θi to be expression in chip i• Efron et al consider log PM - 0.5 log MM. It is muchless frequently <0.• Another summary is the second largest PM, PM(2)PM MM Nij ij i j ij ij−=+ ∝θφ ε ε σ,(,)02Competing measures of expression, 3• GeneChip_ newest version uses something else,namelywith MM* a version of MM that is never bigger thanPM.)}{log(*jjMMPMghtTukeyBiweisignal −=Competing measures of expression, 4• Why not stick to what has worked for cDNA?Again A is a suitable set of pairs.Care needed with BG, and we need to robustify.12Αlog ( )PM BGjjA−∈∑What we do: four steps We use only PM, and ignore MM. Also, we• Adjust for background on the raw intensity scale• Take log2 of background adjusted PM• Carry out quantile normalization of log2(PM-BG),with chips in suitable sets• Conduct a robust multi-chip analysis (RMA) of thesequantities We call our approach RMAWhy remove background?White arrows mark the means+= Signal + Noise = ObservedBackground model: pictoriallyPM data on log2 scale: raw and fitted modelBackground model: formulae• Observed PM intensity denoted by S.• Model S as the sum of a signal X and a background Y,S=X+Y, where we assume X is exponential (α) and Y isNormal (µ, σ2), X, Y independent random variables.• Background adjusted values are then E(X|S=s), which is a + b[φ(a/b) - φ((s-a)/b)]/[Φ(a/b) - Φ((s-a)/b) - 1], where a = s - µ - σ2 α , b = σ, and φ and Φ are the normaldensity and cumulative density, respectively. This is our model and formula for background correction.α,µσObserved PM vs Corrected PMAs s increases, thebackground correctionasymptotes to s - µ - ασ2 .In practice, µ >> ασ2,so this is ~ s - µ .Quantile normalization• Quantile normalization is a method to makethe distribution of probe intensities the samefor every chip.• The normalization distribution is chosen byaveraging each quantile across chips.• The diagram that follows illustrates thetransformation.Quantile normalization: pictorially• The two distribution functions are effectivelyestimated by the sample quantiles.• Quantile normalization is fast• After normalization, variability of expressionmeasures across chips reduced• Looking at post-normalization PM vs pre-normalization PM (natural and log scales),you can see transformation is non linear.Quantile normalization: in wordsDensity functionDistribution functionF1(x)Raw dataNormalizationdistribution F2(x)()()121normxFFxQuantile normalization: formulaexnorm = F2-1(F1(x))After vs Before: intensity scaleAfter vs Before: log intensity scaleM v A plots of chip pairs: before normalizationM v A plots of chip pairs: after quantile normalizationDilution series: before and after quantile normalization in groups of 5Note systematic effects of scanners 1,…,5 in before boxplotsNormalization reduces variability in comparison withVertical: log[var q. norm/var other]; Horizontal: Aver. log mean Note differences in vertical scalesQuantile vs Un-normalizedQuantile vs Affymet. normalizedProbe effects: spike-in experiments• Set A: 11 control cRNAs were spiked in,all at the same concentration, whichvaried across chips.• Set B: 11 control cRNAs were spiked in,all at different concentrations, whichvaried across chips. The concentrationswere arranged in 12x12 cyclic Latinsquare (with 3 replicates)Set A: Probe level dataSet A: Probe level dataSet A: Probe level dataSet A: Probe level dataRMA = Robust multi-chip analysis• Background correct PM• Normalize (quantile normalization)• Assume additive model:• Estimate chip effects ai and probeeffects bj using a robust methodlog( )PM BG a bij i j ij−=++εComparing expression summariesusing spike-in data111.03.0DapX-M101.55.0DapX-5910025.0BioC-382.012.5CreX-575.050.0CreX-363.035.7DapX-3550.01.5BioDn-3437.51.0BioB-M475.02.0BioC-5225.00.5BioB-310.5100BioB-5RankConc 2Conc 1Probe SetLater we consider 23 different combinations of concentrationsDifferential expressionDifferential expressionDifferential expressionDifferential expressionObserved ranks61319165DapX-M10651Top 15102857233732453612MAS


View Full Document

Berkeley STATISTICS 246 - Introduction to Affymetrix GeneChip data

Documents in this Course
Meiosis

Meiosis

46 pages

Meiosis

Meiosis

47 pages

Load more
Download Introduction to Affymetrix GeneChip data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to Affymetrix GeneChip data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to Affymetrix GeneChip data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?