DOC PREVIEW
UConn MCB 2210 - Lecture notes

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Microarray Technology andData Analysis(November 28, 2007) slides assembled by Dong-Guk Shin and J Peter GogartenIntroduction to MicroarrayTechnologyTwo color microarrays:two conditions two labels for cDNA develop slide with mRNAsmake images, one for each probefuse in computer hybridize mixture of both probesto printed glass slidesAn alternative is to synthesize the DNA directly onto thematrix (slides from Affymetrix)created through photolithographyon cell in arrayhybridization to labeled RNA from sample2result of hybridization to arrayExperimental design andsources for variation Biological Variation - Rules of thumb:•Biological Replicates are a must!•As many biological replicates as you can afford!•Cell population as homogeneous as possible! Sample Processing VariationArray & Environment VariationTechnical VariationEffect Size“One characteristic common to all biological material is that it varies.”Finney, 1953E.g.: Two mice in two different cagesControl of Experiment Variance !Degree of Replication•Robustness of the method " Spot replication•Dye Swap " array replication•Robustness of the biological assay•Absolute Transcript frequency/signal intensity " Sample replication•Relative Transcript frequency associated with the biological effect "Sample replication•Cellular sample composition " Sample replication“If I had to replicate my experiments, I could only do half as much.”Botstein, 1999!Biological Replication•Biological variance " 0•High accuracy experiment•Biological and technical variation are confounded•Measurement precisiondecreased!Technical Replication•Technical variance " 0•High Precisionexperiment•Technical Replication: Estimationof technical Variation•biological effect inaccurateStatistical Analysis and Design!Single Color•Post hoc comparison!Two Color•Direct comparison•Indirect comparison• Post Hoc Design– 2 data point/gene/condition• Loop Design (Balanced)– biological and technical variationnot confounded– 8 datapoints/ gene/condition• Reference Design (Unbalanced)– biological and technical variationnot confounded– Reference overrepresented– 4 data point/gene/condition The number of independent data points is a function of the comparisondesign:Poolingfrom http://discover.nci.nih.gov/microarrayAnalysis/Experimental.Design.jspA reference design: the red and green arrows represent chips.A loop design: arrows represent chips with samples labeled as indicated.A saturated design w/o dye swap A design for a comparative study of the effect of a treatmenton two biological strains with replicates and a few dye swapsfrom http://discover.nci.nih.gov/microarrayAnalysis/Experimental.Design.jspTopic 2Data Preprocessing• Background Correction• NormalizationBackground Correction• None– DNA vs Substrate– No Imputation/Offset• Local– Negative Signal Intensitieslikely– Imputation/Offset required• Global– Negative Signal Intensitieslikely– Imputation/Offset required• Moving Minimum– 3x3 spot average background– Negative Signal Intensitieslikely– Imputation/Offset required• Edwards– log-linear interpolation ofbackground intensities– Background Intensity insensitive– Test for Imputation• Norm-Exp– regression based backgroundestimation using Signal to Noiseratios– Background Intensity sensitive– No Imputation3Normalization Background correction Expression ratio: Ti= Ri/Gilog2(ratio)log2(1) = 0, log2(2) = 1, log2(1/2) = !1, log2(4) = 2, log2(1/4) = !2total intensity normalization: If one has a large random sample of genes mostof which remain unchanged, one could normalize so that the mean ratio (T)for all spots is 1.(for the log2Ti correction this corresponds to a subtraction of a constant.see http://www.nature.com/cgi-taf/DynaPage.taf?file=/ng/journal/v32/n4s/full/ng1032.html&filetype=pdf )Cond.2aCond.2bCond.2cCond.1aCond.1bCond.1cComparison Synth. Image Scatterplot RatiohistogramTwo Color Analytical Plotstypical depiction ratio versus intensity (log R +log G)From: http://www.nature.com/cgi-taf/DynaPage.taf?file=/ng/journal/v32/n4s/full/ng1032.html&filetype=pdf after locally weighted linear regression analysisFrom: http://www.nature.com/cgi-taf/DynaPage.taf?file=/ng/journal/v32/n4s/full/ng1032.html&filetype=pdf BewareAny data adjustment, even if it performed assophisticated or industrious as possible, cannotconvert low quality data into high quality data.Data adjustment always removes a part of thebiology. !!Use it as sparingly as possible!!Filtering DataFrom: http://www.nature.com/cgi-taf/DynaPage.taf?file=/ng/journal/v32/n4s/full/ng1032.html&filetype=pdf Outliers in theoriginal data (in red)are excluded fromthe remainder of thedata (blue) selectedon the basis of atwo-standard-deviation cut on thereplicates.Statistical Methods forIdentifying DifferentiallyExpressed Genes inReplicated MicroarrayExperiments4Sample 1 Sample 2 Sample MGene 1Gene 2 Gene NExpression ProfileExpression SignatureGene Expression Data represented as N x M MatrixN rows correspond to the N genes.M columns correspond to the M samples (microarray experiments).Each column = a sample or a replicateExample:Four replicate spots per array produces four column R/G ratio.If four replicate arrays are used,It will produce a 16 column matrix.Or 32 if R and G values are putseparately.Student’s Test Statistics 99% 95% 68% of all samples H0: The groups are not different#Naïve solution: do t-test for each gene.#Multiplicity Problem: The probability of errorincreases.(Bonferoni correction too conservative!)Linear Models for Microarray DataPackage to analyze MA data. Good plot capabilities.Significance Analysis of Microarrays semi-parametric hierarchical (SPH) mixture model5Significance Analysis ofMicroarrays (SAM)uses balancedpermutations (sampleversus control intensities“re-labeling”) togenerate an expectationfor the comparisonVolcano plotscompare significance (Y-axis) against effect (xaxis)•The plot compares significancedeterminations obtained with MAANOVA(MicroArray ANalysis Of VAriance)•On the plot, the y-axis value is -log10(P-value) for the F1 test. The x-axis value isproportional to the fold changes.•A horizontal line represents thesignificance threshold of the F1 test.•Blue dots: EE genes•Green dots: F3•Orange dots: Fs•F2 (In example graph, F2 testswere not run.)Microarray Data: ClusteringClusteringAssign n similar objects to


View Full Document

UConn MCB 2210 - Lecture notes

Download Lecture notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?