Unformatted text preview:

Beyond Synexpression Relationships: Local Clustering of Time-shifted and Inverted Gene Expression Profiles Identifies...IntroductionFigure 1Algorithms and DatasetsLocal alignment between pairs of expression profilesCell-cycle dataset and generation of similarity matrixSignificance statisticsSingle-linkage clusteringOverall Network TopologyFigure 2Figure 3Figure 4Examples of Relationships Found by Local ClusteringFigure 5Figure 6Simultaneous relationshipsWell-documented relationshipsInverted relationshipsWell-documented relationshipNew, suggested relationshipTime-delayed relationshipsStrongly documented suggested relationshipNew suggested relationshipAdditional relationshipsOverall Relationship of Local Clustering to Protein FunctionGeneral formalismLikelihood of local clustering finding known protein-protein interactionsLikelihood of local clustering finding proteins with the same cellular roleComposition of different relationshipsExtension to Other Datasets Beyond the Yeast Cell-cycleTable 1Summary and DiscussionPossible extensions to algorithmLimitations and future directionsAcknowledgmentsReferencesBeyond Synexpression Relationships: LocalClustering of Time-shifted and Inverted GeneExpression Profiles Identifies New, BiologicallyRelevant InteractionsJiang Qian, Marisa Dolled-Filhart, Jimmy Lin, Haiyuan YuandMarkGerstein*Department of MolecularBiophysics and BiochemistryYale University, 266 WhitneyAvenue, PO Box 208114, NewHaven, CT 06520-8114, USAThe complexity of biological systems provides for a great diversity ofrelationships between genes. The current analysis of whole-genomeexpression data focuses on relationships based on global correlation overa whole time-course, identifying clusters of genes whose expression levelssimultaneously rise and fall. There are, of course, other potential relation-ships between genes, which are missed by such global clustering. Theseinclude activation, where one expects a time-delay between relatedexpression pro®les, and inhibition, where one expects an invertedrelationship. Here, we propose a new method, which we call local clus-tering, for identifying these time-delayed and inverted relationships. It isrelated to conventional gene-expression clustering in a fashion analogousto the way local sequence alignment (the Smith-Waterman algorithm) isderived from global alignment (Needleman-Wunsch). An integral part ofour method is the use of random score distributions to assess the statisti-cal signi®cance of each cluster. We applied our method to the yeast cell-cycle expression dataset and were able to detect a considerable numberof additional biological relationships between genes, beyond those result-ing from conventional correlation. We related these new relationshipsbetween genes to their similarity in function (as determined from theMIPS scheme) or their having known protein-protein interactions (asdetermined from the large-scale two-hybrid experiment); we found thatgenes strongly related by local clustering were considerably more likelythan random to have a known interaction or a similar cellular role. Thissuggests that local clustering may be useful in functional annotation ofuncharacterized genes. We examined many of the new relationships indetail. Some of them were already well-documented examples of inhi-bition or activation, which provide corroboration for our results. Forinstance, we found an inverted expression pro®le relationship betweengenes YME1 and YNT20, where the latter has been experimentally docu-mented as a bypass suppressor of the former. We also found newrelationships involving uncharacterized yeast genes and were able tosuggest functions for many of them. In particular, we found a time-delayed expression relationship between J0544 (which has not yet beenfunctionally characterized) and four genes associated with the mitochon-dria. This suggests that J0544 may be involved in the control or activationof mitochondrial genes. We have also looked at other, less extensive data-sets than the yeast cell-cycle and found further interesting relationships.Our clustering program and a detailed website of clustering resultsisavailableathttp://www.bioinfo.mbb.yale.edu/expression/cluster(orhttp://www.genecensus.org/expression/cluster).# 2001 Academic Press*Corresponding authorE-mailaddressofthecorrespondingauthor:[email protected] used: ORF, open reading frame.doi:10.1006/jmbi.2001.5219 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 314, 1053±10660022-2836/01/051053±14 $35.00/0 # 2001 Academic PressKeywords: gene expression; local clustering; time-shifted; inverted;bioinformaticsIntroductionThe massive datasets generated by microarrayexperiments present a challenge to those interestedin studying the regulatory relationship betweengenes.1±5Uptonow,oneofthemainchallengeshas been to devise methods for grouping togethergenes that have similar expression pro®les; this isdone to determine clusters of genes that are tran-scribed together as cellular conditions vary. Themost obvious use of such clusters is an improvedunderstanding of transcription regulatory net-works within genomes. Genes with similarexpression pro®les are likely to be subject to identi-cal, or related, transcriptional control. This fact hasbeen used to search for binding site motifs com-montocoregulatedgenes.6±8There are further applications for expressionclustering, especially in combination with otherinformation about genes such as their subcellularlocalizations, metabolic functions, and intermolecu-larinteractions.9±13,58,59Inparticular,microarraytechnology allows for studying the entire genome,while other types of gene annotation (e.g. bio-chemical functions) are often available only for afraction of the genes. Therefore, researchers haveattempted to predict protein function and inter-action by expression clustering. This is based on``guiltbyassociation'',14thepremisethatproteinswith similar expression pro®les (i.e. synexpressionrelationship)havesimilarfunctions.15±18Given the central importance of gene clusters inthe studies just described, computational methodshave been devised to (i) assess the similaritybetween pairs of expression pro®les from differentgenes, and then (ii) group together those geneswith similar pro®les. Effectively, the two aims areanalogous to approaches in protein sequence anal-ysis, where there are methods for assessingsequence similarity between pairs of


View Full Document

CORNELL CS 726 - Study Notes

Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?