Berkeley STATISTICS 246 - Preprocessing of cDNA microarray Data - D789068

Home> Schools> University of California, Berkeley> (STATISTICS) > STATISTICS 246> Preprocessing of cDNA microarray Data

DOC PREVIEW

Berkeley STATISTICS 246 - Preprocessing of cDNA microarray Data

School name University of California, Berkeley

Course Statistics 246- Statistical Genetics

Pages 36

This preview shows page 1-2-17-18-19-35-36 out of 36 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 36 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Preprocessing of cDNA microarray Data Statistics 246 Spring 2002 Week 7 Lecture 2 Begin by looking at the data Was the experiment a success What analysis tools should be used Are there any specific problems Red Green overlay images Co registration and overlay offers a quick visualization revealing information on color balance uniformity of hybridization spot uniformity background and artifiacts such as dust or scratches Good low bg lots of d e Bad high bg ghost spots little d e Always log always rotate log R vs log G 2 2 M log R G vs A log RG 2 2 Histograms Signal Noise log spot intensity background intensity 2 Boxplots of log R G 2 Liver samples from 16 mice 8 WT 8 ApoAI KO Spatial plots background from the two slides Highlighting extreme log ratios Top black and bottom green 5 of log ratios Log ratios Boxplots and highlighting Print tip groups pin group Clear example of spatial bias Pin group sub array effects Lowess lines through points from pin groups Boxplots of log ratios by pin group Plate effects KO 8 Probes 6 000 cDNAs including 200 related to lipid metabolism Arranged in a 4x4 array of 19x21 sub arrays Time of printing effects spot number Green channel intensities log G Printing over 4 5 days 2 The previous slide depicts a slide from this print run Normalization Why To correct for systematic differences between samples on the same slide or between slides which do not represent true biological variation between samples How do we know it is necessary By examining self self hybridizations where no true differential expression is occurring We find dye biases which vary with overall spot intensity location on the array plate origin pins scanning parameters Self self hybridizations False color overlay Boxplots within pin groups Scatter MA plots A series of non self self hybridizations From the NCI60 data set Stanford web site Early Ngai lab UC Berkeley Early Goodman lab UC Berkeley From the Ernest Gallo Clinic Research Center Early PMCRI Melbourne Australia Normalization methods a Normalization based on a global adjustment log 2 R G log 2 R G c log Choices for k or c log k 2 2 R kG are c median or mean of log ratios for a particular gene set e g housekeeping genes Or total intensity normalization where k R G i i b Intensity dependent normalization Here we run a line through the middle of the MA plot shifting the M value of the pair A M by c c A i e log 2 R G log 2 R G c A log 2 R k A G One estimate of c A is made using the LOWESS function of Cleveland 1979 LOcally WEighted Scatterplot Smoothing Normalization methods c Within print tip group normalization In addition to intensity dependent variation in log ratios spatial bias can also be a significant source of systematic error Most normalization methods do not correct for spatial effects produced by hybridization artifacts or print tip or plate effects during the construction of the microarrays It is possible to correct for both print tip and intensity dependent bias by performing LOWESS fits to the data within print tip groups i e log 2 R G log 2 R G c A log i 2 R k A G i where c A is the LOWESS fit to the MA plot for the ith grid only i Which spots to use for normalization The LOWESS lines can be run through many different sets of points and each strategy has its own implicit set of assumptions justifying its applicability For example we can justify the use of a global LOWESS approach by supposing that when stratified by mRNA abundance a only a minority of genes are expected to be differentially expressed or b any differential expression is as likely to be up regulation as down regulation Pin group LOWESS requires stronger assumptions that one of the above applies within each pin group The use of other sets of genes e g control or housekeeping genes involve similar assumptions Use of control spots Lowess curve blanks Negative Positive spotted in controls varying concentrations controls M log R G logR logG A logR logG 2 Global scale global lowess pin group lowess spatial plot after smooth histograms of M after MSP titration series Microarray Sample Pool Pool the whole library Control set to aid intensity dependent normalization Different concentrations Spotted evenly spread across the slide MSP normalization compared to other methods Orange Schadt Wong rank invariant set Yellow GAPDH tubulin Light blue MSP pool titration Red line lowess smooth Composite normalization c A g A 1 f A i A Before and after composite normalization A i MSP lowess curve Global lowess curve Composite lowess curve Other colours control spots Comparison of Normalization Schemes courtesy of Jason Goncalves No consensus on best segmentation or normalization method Scheme was applied to assess the common normalization methods Based on reciprocal labeling experiment data for a series of 140 replicate experiments on two different arrays each with 19 200 spots DESIGN OF RECIPROCAL LABELING EXPERIMENT Replicate experiment in which we assess the same mRNA pools but invert the fluors used The replicates are independent experiments and are scanned quantified and normalized as usual The following relationship would be observed for reciprocal microarray experiments in which the slides are free of defects and the normalization scheme performed ideally log 2 Ratio Ch1 Ch 2 GeneA Exp 1 log 2 Ratio Ch1 Ch 2 GeneA We can measure using real data sets how well each microarray normalization scheme approaches this ideal Exp 2 Deviation metric to assess normalization schemes Deviation Spot log Ratio 2 n Deviation ArrayAverage log 2 Ch1 Ch 2 GeneA Ratio Exp 1 Ch1 Ch 2 GeneN log Exp 1 2 Ratio log 2 Ch1 Ch 2 GeneA Ratio Exp 2 Ch1 Ch 2 GeneN Exp 2 1 n We now use the mean array average deviation to compare the normalization methods Note that this comparison addresses only variance precision and not bias accuracy aspects of normalization Comparison of Normalization Methods Using 140 19K Microarrays 0 46 0 44 Average Mean Deviation Value 0 42 0 4 0 38 0 36 0 34 0 32 0 3 Pre Normalized Global Intensity Subarray Intensity Global Ratio Normalization Method Sub Array Ratio Global LOWESS Subarray LOWESS Scale normalization between slides Boxplots of log ratios from 3 replicate self self hybridizations Left panel before normalization Middle panel after within print tip group normalization Right panel after a further between slide scale normalization The NCI 60 experiments no bg Some scale normalization seems desirable Log ratios Scale normalization another data set Only small

View Full Document