Stanford CS 262 - Systematic determination of genetic network architecture - D1743516

Home> Schools> Stanford University> Computer Science (CS) > CS 262> Systematic determination of genetic network architecture

DOC PREVIEW

Stanford CS 262 - Systematic determination of genetic network architecture

School name Stanford University

Course Cs 262- Computational Genomics

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

letternature genetics • volume 22 • july 1999 281Systematic determination of genetic network architectureSaeed Tavazoie1, Jason D. Hughes1,2, Michael J. Campbell3, Raymond J. Cho4& George M. Church11Department of Genetics, Harvard Medical School, 200 Longwood Ave, Boston, Massachusetts 02115, USA. 2Graduate Program in Biophysics, 200 LongwoodAve, Harvard University, Boston, Massachusetts 02115, USA. 3Molecular Applications Group, 607 Hansen Way, Building One, Palo Alto, California 94303-1110, USA. 4Department of Genetics, B400 Beckman Center, 279 Campus Drive, Stanford Medical Center, Palo Alto, California 94304, USA.Correspondence should be addressed to G.M.C. (e-mail: [email protected]).Technologies to measure whole-genome mRNA abundances1−3and methods to organize and display such data4−10are emerg-ing as valuable tools for systems-level exploration of transcrip-tional regulatory networks. For instance, it has been shown thatmRNA data from 118 genes, measured at several time points inthe developing hindbrain of mice, can be hierarchically clus-tered into various patterns (or ‘waves’) whose members tend toparticipate in common processes5. We have previously shownthat hierarchical clustering can group together genes whose cis-regulatory elements are bound by the same proteins in vivo6.Hierarchical clustering has also been used to organize genesinto hierarchical dendograms on the basis of their expressionacross multiple growth conditions7. The application of Fourieranalysis to synchronized yeast mRNA expression data has iden-tified cell-cycle periodic genes, many of which have expectedcis-regulatory elements8. Here we apply a systematic set of sta-tistical algorithms, based on whole-genome mRNA data, parti-tional clustering and motif discovery, to identify transcriptionalregulatory sub-networks in yeastwithout any a priori knowl-edge of their structure or any assumptions about their dynam-ics. This approach uncovered new regulons (sets of co-regulatedgenes) and their putative cis-regulatory elements. We used sta-tistical characterization of known regulons and motifs to derivecriteria by which we infer the biological significance of newlydiscovered regulons and motifs. Our approach holds promisefor the rapid elucidation of genetic network architecture insequenced organisms in which little biology is known.We designed our approach to be systematic and minimally biasedby previous knowledge of yeast biology. Our objective was to dis-cover distinct expression patterns (clusters) in mRNA data setsand then identify upstream DNA sequence patterns specific toeach expression cluster. A DNA sequence pattern that is specificto a single expression cluster constitutes the primary hypothesisfor the cis-regulatory element through which co-regulation of thegenes within the cluster is achieved.We used data gathered by Cho et al.11who used Affymetrixoligonucleotide microarrays12to query the abundances of 6,220mRNA species in synchronized Saccharomyces cerevisiae batch cul-tures. The data provided us with 15 time points, across two cellcycles. We variance-normalized the expression profile of each ORFand clustered the most variable 3,000 ORFs into 30 clusters of49–186 ORFs per cluster. The clustering procedure groups togetherORFs on the basis of their common expression patterns across thetime points. We and others have previously used hierarchical algo-rithms13for clustering such data4−8. Here we use the k-means algo-rithm14, a partitional method13that by iterative reallocation ofcluster members minimizes the overall within-cluster dispersion.We found the members of each cluster to be significantlyenriched for genes with similar functions. We mapped the genesin each cluster to the 199 functional categories in the MartinsriedInstitute of Protein Sciences functional classification scheme(MIPS) database15. For each cluster, we calculated P values forobserving the frequencies of genes in particular functional cate-gories. There was significant grouping of genes within the sameTable 1 • Enrichment of clusters for ORFs within functional categoriesCluster Periodicity Number of MIPS functional ORFs within P valueindex ORFs (n) category (total ORFs) functional category (k) −log101 0.07 164 ribosomal proteins (206) 64 54organization of cytoplasm (555) 79 39organization of chromosome structure (41) 7 42 0.38 186 DNA synthesis and replication (82) 23 16cell-cycle control and mitosis (312) 30 8recombination and DNA repair (84) 11 5nuclear organization (720) 40 44 0.14 170 mitochondrial organization (339) 32 10respiration (79) 10 57 0.35 101 cell-cycle control and mitosis (312) 17 5budding, cell polarity, filament formation (161) 10 4aDNA synthesis and replication (82) 7 4a8 0.09 148 TCA pathway (22) 5 4acarbohydrate metabolism (411) 22 4a14 0.45 74 organization of centrosome (28) 6 6nuclear biogenesis (5) 3 5organization of cytoskeleton (93) 7 4a30 0.24 60 nitrogen and sulphur metabolism (75) 9 8amino acid metabolism (203) 12 7Periodicity index is a quantitative measure of cell-cycle periodicity. The most highly enriched functional categories are given for each cluster. We calculated P val-ues using the cumulative hypergeometric probability distribution for finding at least (k) ORFs from a particular functional category within a cluster of size (n).Because 199 MIPS functional categories were tested for each cluster, P values greater than 3×10−4are not reported, as their total expectation within the clusterwould be greater than 0.05. aBecause all 30 clusters were tested independently, these P values may have marginal significance.© 1999 Nature America Inc. • http://genetics.nature.com© 1999 Nature America Inc. • http://genetics.nature.comletter282 nature genetics • volume 22 • july 1999functional class (Table 1). The most notable functional groupingoccurred for genes in cluster 1, where 64 of 164 genes encoderibosomal proteins (P value of 10−54). Not all clusters showed sig-nificant enrichment for function. The members of such clustersmay participate in multiple classically defined processes andtherefore may not show significant enrichment in any one func-tional category. Alternatively, the number of clusters (30) mayoverestimate the underlying diversity of biological expressionclasses in the data set. We erred on the side of over-classification,however, to avoid missing significant expression classes. Subse-quently, independent analyses, such as functional

View Full Document