DOC PREVIEW
Berkeley STATISTICS 246 - Introduction to Affymetrix GeneChip data

This preview shows page 1-2-3-4-5-33-34-35-36-67-68-69-70-71 out of 71 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 71 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Introduction to Affymetrix GeneChip data Stat 246 Spring 2002 Week 16 Summary Review of technology Probeset summaries What we do our 4 steps Assessing the technology and the different expression measures How robustness works Probe arrays Hybridized Probe Cell GeneChip Probe Array Single stranded labeled RNA target Oligonucleotide probe 2 4 m 1 28cm Millions of copies of a specific oligonucleotide probe 200 000 different complementary probes Image of Hybridized Probe Array Compliments of D Gerhold Image analysis About 100 pixels per probe cell These intensities are combined to form one number representing expression for the probe cell oligo Possibly room for improvement PM MM The big picture Summarize 20 PM MM pairs probe level data into one number for each probe set gene We call this number an expression measure Affymetrix GeneChip Software has defaults Does it work Can it be improved Where is the evidence that it works Lockhart et al Nature Biotechnology 14 1996 Comments The chips used in Lockhart et al contained around 1000 probes per gene Current chips contain 11 20 probes per gene These are quite different situations We haven t seen a plot like the previous one for current chips Some possible problems What if a small number of the probe pairs hybridize much better than the rest removing the middle base does not make a difference for some probes some MMs are PMs for some other gene there is need for normalization We explore these possibilities using a variety of data sets SD vs Avg across replicate chips ANOVA Strong probe effect 5 times bigger than gene effect Competing measures of expression GeneChip older software uses Avg diff 1 Avg diff PM j MM j j with A a set of suitable pairs chosen by software 30 40 can be 0 Log PMj MMj was also used For differential expression Avg diffs are compared between chips Competing measures of expression 2 Li and Wong fit a model PM ij MM ij i j ij ij N 0 2 They consider i to be expression in chip i Efron et al consider log PM 0 5 log MM It is much less frequently 0 Another summary is the second largest PM PM 2 Competing measures of expression 3 GeneChip newest version uses something else namely signal TukeyBiwei ght log PM j MM j with MM a version of MM that is never bigger than PM Competing measures of expression 4 Why not stick to what has worked for cDNA 1 log PM 2 j BG j A Again A is a suitable set of pairs Care needed with BG and we need to robustify What we do four steps We use only PM and ignore MM Also we Adjust for background on the raw intensity scale Take log2 of background adjusted PM Carry out quantile normalization of log2 PM BG with chips in suitable sets Conduct a robust multi chip analysis RMA of these quantities We call our approach RMA Why remove background White arrows mark the means Background model pictorially Signal Noise Observed PM data on log2 scale raw and fitted model Background model formulae Observed PM intensity denoted by S Model S as the sum of a signal X and a background Y S X Y where we assume X is exponential and Y is 2 Normal X Y independent random variables are then E X S s which is Background adjusted values a b a b s a b a b s a b 1 where a s 2 b and and are the normal density and cumulative density respectively This is our model and formula for background correction Observed PM vs Corrected PM As s increases the background correction asymptotes to s 2 In practice 2 so this is s Quantile normalization Quantile normalization is a method to make the distribution of probe intensities the same for every chip The normalization distribution is chosen by averaging each quantile across chips The diagram that follows illustrates the transformation Quantile normalization pictorially Quantile normalization in words The two distribution functions are effectively estimated by the sample quantiles Quantile normalization is fast After normalization variability of expression measures across chips reduced Looking at post normalization PM vs prenormalization PM natural and log scales you can see transformation is non linear Quantile normalization formulae xFFx 21 1 norm xnorm F2 1 F1 x Density function Distribution function F1 x Raw data Normalization distribution F2 x After vs Before intensity scale After vs Before log intensity scale M v A plots of chip pairs before normalization M v A plots of chip pairs after quantile normalization Dilution series before and after quantile normalization in groups of 5 Note systematic effects of scanners 1 5 in before boxplots Normalization reduces variability in comparison with Quantile vs Un normalized Quantile vs Affymet normalized Vertical log var q norm var other Horizontal Aver log mean Note differences in vertical scales Probe effects spike in experiments Set A 11 control cRNAs were spiked in all at the same concentration which varied across chips Set B 11 control cRNAs were spiked in all at different concentrations which varied across chips The concentrations were arranged in 12x12 cyclic Latin square with 3 replicates Set A Probe level data Set A Probe level data Set A Probe level data Set A Probe level data RMA Robust multi chip analysis Background correct PM Normalize quantile normalization Assume additive model log PM ij BG ai b j ij Estimate chip effects a i and probe effects bj using a robust method Comparing expression summaries using spike in data Probe Set Conc 1 Conc 2 Rank BioB 5 100 0 5 1 BioB 3 0 5 25 0 2 BioC 5 2 0 75 0 4 BioB M 1 0 37 5 4 BioDn 3 1 5 50 0 5 DapX 3 35 7 3 0 6 CreX 3 50 0 5 0 7 CreX 5 12 5 2 0 8 BioC 3 25 0 100 9 DapX 5 5 0 1 5 10 DapX M 3 0 1 0 11 Later we consider 23 different combinations of concentrations Differential expression Differential expression Differential expression Differential expression Observed ranks Gene AvDiff MAS 5 0 Li Wong AvLog PM BG BioB 5 6 2 1 1 BioB 3 16 1 3 2 BioC 5 74 6 2 5 BioB M 30 3 7 3 BioDn 3 44 5 6 4 DapX 3 239 24 24 7 CreX 3 333 73 36 9 CreX 5 3276 33 3128 8 BioC 3 2709 8572 681 6431 DapX 5 2709 102 12203 10 DapX M 165 19 13 6 Top 15 1 5 6 10 Observed vs true ratios Observed vs true ratios Observed vs true ratios Observed vs true ratios AvLog PM BG a precursor to RMA Dilution experiment cRNA hybridized to human chip HGU95 in range of proportions and dilutions Dilution series begins at 1 25 g cRNA per GeneChip array and rises through 2 5 5 0 7 5 10 0 to 20 0 g per array 5 replicate chips were used at each dilution Normalize just within each set of 5 replicates For each …


View Full Document

Berkeley STATISTICS 246 - Introduction to Affymetrix GeneChip data

Documents in this Course
Meiosis

Meiosis

46 pages

Meiosis

Meiosis

47 pages

Load more
Download Introduction to Affymetrix GeneChip data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to Affymetrix GeneChip data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to Affymetrix GeneChip data and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?