Unformatted text preview:

Importance of replication in microarray geneexpression studies: Statistical methodsand evidence from repetitive cDNAhybridizationsMei-Ling Ting Lee*†‡§, Frank C. Kuo†¶, G. A. Whitmore储, and Jeffrey Sklar†¶*Departments of Medicine and¶Pathology, Brigham and Women’s Hospital, Boston, MA 02115;†Harvard Medical School, Boston, MA 02115;‡BiostatisticsDepartment, Harvard School of Public Health, Boston, MA 02115; and储Faculty of Management, McGill University, Montreal, Quebec, Canada H3A 1G5Edited by Bradley Efron, Stanford University, Stanford, CA, and approved June 23, 2000 (received for review March 13, 2000)We present statistical methods for analyzing replicated cDNAmicroarray expression data and report the results of a controlledexperiment. The study was conducted to investigate inherentvariability in gene expression data and the extent to which repli-cation in an experiment produces more consistent and reliablefindings. We introduce a statistical model to describe the proba-bility that mRNA is contained in the target sample tissue, convertedto probe, and ultimately detected on the slide. We also introducea method to analyze the combined data from all replicates. Of the288 genes considered in this controlled experiment, 32 would beexpected to produce strong hybridization signals because of theknown presence of repetitive sequences within them. Resultsbased on individual replicates, however, show that there are 55,36, and 58 highly expressed genes in replicates 1, 2, and 3,respectively. On the other hand, an analysis by using the combineddata from all 3 replicates reveals that only 2 of the 288 genes areincorrectly classified as expressed. Our experiment shows that anysingle microarray output is subject to substantial variability. Bypooling data from replicates, we can provide a more reliableanalysis of gene expression data. Therefore, we conclude thatdesigning experiments with replications will greatly reduce mis-classification rates. We recommend that at least three replicates beused in designing experiments by using cDNA microarrays, partic-ularly when gene expression data from single specimens are beinganalyzed.Although the high-throughput technology now available en-ables genetic researchers to study expression for thousandsof genes simultaneously, experiments by using microarrays maybe costly and time consuming. The manufacturers of microarrayequipment do not stress the need for replication of studies.Production of arrays can be slow and the supply limited. As aresult, most current molecular genetic studies that use microar-ray technology are sometimes done without replication. How-ever, statistical analyses in many settings have demonstrated thatimportant insights into the nature of inherent variability areobtained by the replication of experiments.In Section 1, we report the design of a controlled experimentinvolving replication of cDNA hybridizations. The study wasconducted to investigate inherent variability in gene expressiondata and the extent to which replication in an experimentproduces more consistent and reliable findings. In Sections 2.1and 2.2, we introduce statistical models to describe the proba-bility that an mRNA is contained in the target sample tissue,converted to probe, and ultimately detected on the slide as anobserved expression. We use a mixed normal distribution tomodel the distribution of observed gene expressions. In Sections2.3 and 2.4, we conduct a separate analysis for each replicate. InSections 2.5 and 2.6, we introduce a model to provide a jointanalysis based on the combined data collected from all repli-cates. In Section 2.7, we consider the reliability of the classifi-cation of gene expression as a function of the number ofreplicates.Our results show that any single microarray output is subjectto substantial variability. By pooling data from replicates, we canprovide a more reliable classification of gene expression. There-fore, we conclude that designing experiments with replicationswill greatly reduce misclassification rates. We recommend thatat least three replicates be used in designing experiments usingcDNA microarrays. Although our results depend on specificinstruments and techniques, the statistical models and methodsthat we propose in this article can be applied in general settings.1. Materials and MethodsIn this section, we provide a brief description of our experimen-tal process. To check the consistency of microarray experiments,we conducted a study to investigate whether the unevenness ofthe surfaces of glass slides, the locations of cDNA spots on theslides, and other aspects of a microarray experiment may pro-duce variation in measurements of transcriptions. To test thesevariables of cDNA microarrays generated in our facility, weprinted triplicates of 288 cDNA sets (288 elements per set) at 3locations on the same slide and performed hybridization exper-iments with probes from 1 source. By comparing the signals fromthese triplicates, we hoped to learn about the reproducibility ofthe array process and whether seemingly minor factors, such asthe location of the spots in the array, can affect the outcome ofanalyses. Of the 288 genes considered in this experiment, 32would be expected a priori to appear highly expressed because ofstructural features within the genes, namely Alu repeats thatshould crosshybridize to similar sequences widely distributedamong expressed and nonexpressed portions of the genome.1.1. Generation of Array-Ready cDNAs. Frozen glycerol stocks ofEscherichia coli containing individual cDNA clones in theIMAGE consortium distributed in 384-well plates were pur-chased from Genome Systems, St. Louis. Individual bacterialclones were selected and distributed into 96-well plates. Ampli-fications of DNA by PCR with primers specific to the vectorsequences flanking the insert cDNA were performed in 96-wellPCR plates in a Perkin–Elmer 9600 thermocycler in 50-␮lreactions containing ⫻1 PCR buffer (Promega), 1.5 mM MgCl2,0.2 mM dNTPs, 10 pmol of each primer, 5 units of TaqThis paper was submitted directly (Track II) to the PNAS office.§To whom reprint requests should be addressed at: Channing Laboratory, BWH兾HMS, 181Longwood Avenue, Boston, MA, 02115-5804. E-mail: [email protected] publication costs of this article were defrayed in part by page charge payment. Thisarticle must therefore be hereby marked “advertisement” in accordance with 18 U.S.C.§1734 solely to indicate this


View Full Document

CORNELL CS 726 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?