Beyond Synexpression Relationships: Local Clustering of Time-shifted and Inverted Gene Expression Profiles Identifies...IntroductionFigure 1Algorithms and DatasetsLocal alignment between pairs of expression profilesCell-cycle dataset and generation of similarity matrixSignificance statisticsSingle-linkage clusteringOverall Network TopologyFigure 2Figure 3Figure 4Examples of Relationships Found by Local ClusteringFigure 5Figure 6Simultaneous relationshipsWell-documented relationshipsInverted relationshipsWell-documented relationshipNew, suggested relationshipTime-delayed relationshipsStrongly documented suggested relationshipNew suggested relationshipAdditional relationshipsOverall Relationship of Local Clustering to Protein FunctionGeneral formalismLikelihood of local clustering finding known protein-protein interactionsLikelihood of local clustering finding proteins with the same cellular roleComposition of different relationshipsExtension to Other Datasets Beyond the Yeast Cell-cycleTable 1Summary and DiscussionPossible extensions to algorithmLimitations and future directionsAcknowledgmentsReferencesBeyond Synexpression Relationships: LocalClustering of Time-shifted and Inverted GeneExpression Profiles Identifies New, BiologicallyRelevant InteractionsJiang Qian, Marisa Dolled-Filhart, Jimmy Lin, Haiyuan YuandMarkGerstein*Department of MolecularBiophysics and BiochemistryYale University, 266 WhitneyAvenue, PO Box 208114, NewHaven, CT 06520-8114, USAThe complexity of biological systems provides for a great diversity ofrelationships between genes. The current analysis of whole-genomeexpression data focuses on relationships based on global correlation overa whole time-course, identifying clusters of genes whose expression levelssimultaneously rise and fall. There are, of course, other potential relation-ships between genes, which are missed by such global clustering. Theseinclude activation, where one expects a time-delay between relatedexpression pro®les, and inhibition, where one expects an invertedrelationship. Here, we propose a new method, which we call local clus-tering, for identifying these time-delayed and inverted relationships. It isrelated to conventional gene-expression clustering in a fashion analogousto the way local sequence alignment (the Smith-Waterman algorithm) isderived from global alignment (Needleman-Wunsch). An integral part ofour method is the use of random score distributions to assess the statisti-cal signi®cance of each cluster. We applied our method to the yeast cell-cycle expression dataset and were able to detect a considerable numberof additional biological relationships between genes, beyond those result-ing from conventional correlation. We related these new relationshipsbetween genes to their similarity in function (as determined from theMIPS scheme) or their having known protein-protein interactions (asdetermined from the large-scale two-hybrid experiment); we found thatgenes strongly related by local clustering were considerably more likelythan random to have a known interaction or a similar cellular role. Thissuggests that local clustering may be useful in functional annotation ofuncharacterized genes. We examined many of the new relationships indetail. Some of them were already well-documented examples of inhi-bition or activation, which provide corroboration for our results. Forinstance, we found an inverted expression pro®le relationship betweengenes YME1 and YNT20, where the latter has been experimentally docu-mented as a bypass suppressor of the former. We also found newrelationships involving uncharacterized yeast genes and were able tosuggest functions for many of them. In particular, we found a time-delayed expression relationship between J0544 (which has not yet beenfunctionally characterized) and four genes associated with the mitochon-dria. This suggests that J0544 may be involved in the control or activationof mitochondrial genes. We have also looked at other, less extensive data-sets than the yeast cell-cycle and found further interesting relationships.Our clustering program and a detailed website of clustering resultsisavailableathttp://www.bioinfo.mbb.yale.edu/expression/cluster(orhttp://www.genecensus.org/expression/cluster).# 2001 Academic Press*Corresponding authorE-mailaddressofthecorrespondingauthor:[email protected] used: ORF, open reading frame.doi:10.1006/jmbi.2001.5219 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 314, 1053±10660022-2836/01/051053±14 $35.00/0 # 2001 Academic PressKeywords: gene expression; local clustering; time-shifted; inverted;bioinformaticsIntroductionThe massive datasets generated by microarrayexperiments present a challenge to those interestedin studying the regulatory relationship betweengenes.1±5Uptonow,oneofthemainchallengeshas been to devise methods for grouping togethergenes that have similar expression pro®les; this isdone to determine clusters of genes that are tran-scribed together as cellular conditions vary. Themost obvious use of such clusters is an improvedunderstanding of transcription regulatory net-works within genomes. Genes with similarexpression pro®les are likely to be subject to identi-cal, or related, transcriptional control. This fact hasbeen used to search for binding site motifs com-montocoregulatedgenes.6±8There are further applications for expressionclustering, especially in combination with otherinformation about genes such as their subcellularlocalizations, metabolic functions, and intermolecu-larinteractions.9±13,58,59Inparticular,microarraytechnology allows for studying the entire genome,while other types of gene annotation (e.g. bio-chemical functions) are often available only for afraction of the genes. Therefore, researchers haveattempted to predict protein function and inter-action by expression clustering. This is based on``guiltbyassociation'',14thepremisethatproteinswith similar expression pro®les (i.e. synexpressionrelationship)havesimilarfunctions.15±18Given the central importance of gene clusters inthe studies just described, computational methodshave been devised to (i) assess the similaritybetween pairs of expression pro®les from differentgenes, and then (ii) group together those geneswith similar pro®les. Effectively, the two aims areanalogous to approaches in protein sequence anal-ysis, where there are methods for assessingsequence similarity between pairs of
View Full Document