Princeton COS 557 - Similarities and Differences

Unformatted text preview:

Similarities and Differencesin Genome-Wide Expression Dataof Six OrganismsSven Bergmann, Jan Ihmels, Naama Barkai*Departments of Molecular Genetics and Physics of Complex Systems, Weizmann Institute of Science, Rehovot, IsraelComparing genomic properties of different organisms is of fundamental importance in the study of biological andevolutionary principles. Although differences among organisms are often attributed to differential gene expression,genome-wide comparative analysis thus far has been based primarily on genomic sequence information. We present acomparative study of large datasets of expression profiles from six evolutionarily distant organisms: S. cerevisiae, C.elegans, E. coli, A. thaliana, D. melanogaster, and H. sapiens . We use genomic sequence information to connect thesedata and compare global and modular properties of the transcription programs. Linking genes whose expressionprofiles are similar, we find that for all organisms the connectivity distribution follows a power-law, highly connectedgenes tend to be essential and conserved, and the expression program is highly modular. We reveal the modularstructure by decomposing each set of expression data into coexpressed modules. Functionally related sets of genes arefrequently coexpressed in multiple organisms. Yet their relative importance to the transcription program and theirregulatory relationships vary among organisms. Our results demonstrate the potential of combining sequence andexpression data for improving functional gene annotation and expanding our understanding of how gene expressionand diversity evolved.IntroductionMicroarray experiments are now being used to address alarge diversity of biological issues. The large datasets obtainedby pooling those experiments together contain a wealth ofbiological information beyond the insights gained by indi-vidual measurements. For example, it was demonstrated thatdiverse datasets of genome-wide expression profiles can beapplied for facilitating functional assignment of uncharac-terized ORFs and for identification of cis-regulatory elements(Eisen et al. 1998; Kim et al. 2001; Ihmels et al. 2002).Comparing the genomic sequences of different organismspresents an alternative prominent approach for geneannotation and identification of regulatory elements (Cher-vitz et al. 1998; Lynch and Conery 2000; Rubin et al. 2000;Yanai and DeLisi 2002; Frazer et al. 2003). Sequenced-basedcomparative analyses also proved crucial for decipheringevolutionary principles. As evolutionary changes frequentlyalso involve modifications of the gene regulatory program(Carroll 2000; True and Carroll 2002; Wray et al. 2003),integration of expression data into interspecies comparativeanalyses could potentially provide new insights into therelation between genomic sequence and organismal form andfunction. So far, however, such an approach has been mostlyapplied to small numbers of genes (Carroll 2000; True andCarroll 2002; Wray et al. 2003) or has been restricted tovariations in the genome-wide expression profiles during thedevelopment of closely related species (Rifkin et al. 2003).With the accumulation of large-scale expression data for anumber of diverse species, the time may be ripe for a macro-evolutionary comparison of gene expression.Expression data differ from sequence data in two mainaspects, which make their integrati on into comp arativeanalysis challen ging. First, unlike sequence information,which is direct and accurate, expression profiles provide onlyindirect and noisy information about the regulatory relation-ships between genes. Second, while the genomic sequence isessentially complete, expression profiles only cover a subsetof all possible cellular conditions and thus provide onlypartial information about the underlying regulatory pro-gram. Moreover, this subset is typically very different for eachorganism, reflecting distinct physiologies as well as differentresearch foci. One way to circumvent this problem is torestrict the data to a small subset of similar conditions, suchas timepoints along the cell cycle (Alter et al. 2003). Such anapproach, however, drastically reduces the size of the datasetand limits the scope of comparison.Here, we present a comparative analysis of large sets ofexpression data from six evolutionarily distant organisms(Table 1). We integrate the expression data with genomicsequence information to address three biological issues. First,we verify that coexpression is often conserved amongorganisms and propose a method for improving functionalgene annotations using this conservation. We provide a Web-based application suitable for this purpose. Second, wecompare the regulatory relationships between particularfunctional groups in the different organisms, giving initialinsights into the extent of conservation of the gene regulatoryReceived August 11, 2003; Accepted November 4, 2003; Published December15, 2003DOI: 10.1371/journal.pbio.0020009Copyright: Ó 2003 Bergmann et al. This is an open-access article distributedunder the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided theoriginal work is properly cited.Abbreviation: ISA, iterative signature algorithmAcademic Editor: Michael Eisen, Lawrence Berkeley National Laboratory* To whom correspondence should be addressed. E-mail: [email protected] Biology | http://biology.plosjournals.org January 2004 | Volume 2 | Issue 1 | Page 0085PLoSBIOLOGYarchitecture. Interestingly, we find that while functionallyrelated genes are frequently coexpressed in several organ-isms, their organization and relative contribution to theoverall expression program differ. Finally, we compare globaltopological properties of the transcription networks derivedfrom the expression data, using a graph theoretical approach.This analysis reveals that despite the differences in theregulation of individual gene groups, the expression data ofall organisms share large-scale properties.Results and DiscussionCombining Sequence and Expression Data for ImprovingFunctional Gene AnnotationsWith the rapid increase in the number of sequencedgenomes, assigning function to novel ORFs has become amajor computational challenge. Functional links are oftenimputed based on sequence similarity with genes of knownfunctions. Despite the large success of this approach, it hasseveral well-recognized limitations. Foremost, an ORF canhave several close


View Full Document

Princeton COS 557 - Similarities and Differences

Documents in this Course
Load more
Download Similarities and Differences
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Similarities and Differences and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Similarities and Differences 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?