Unformatted text preview:

JOURNAL OF COMPUTATIONAL BIOLOGYVolume 7, Numbers 3/4, 2000Mary Ann Liebert, Inc.Pp. 601–620Using Bayesian Networks to AnalyzeExpression DataNIR FRIEDMAN,1MICHAL LINIAL,2IFTACH NACHMAN,3and DANA PE’ER1ABSTRACTDNA hybridization arrays simultaneously m easure the expression level for thousand s ofgenes. These measurements provide a “snapshot” of transcription levels within the cell. A ma-jor challenge in computational biology is to uncover, from such measurements, gene/proteininteractions and key biological features of cellular systems. In this paper, we propose a newframework for discovering interactions between genes based on multiple expression mea-surements. This framework builds on the use o f Bayesian networks for representing statisticaldependencies. A Bayesian network is a graph-based model of joint multivariate probabilitydistributions that captures properties of conditional independence between variables. Suchmodels are attractive for their ability to describe complex stochastic processes and becausethey provide a clear methodology for learning from (noisy) observations. We start by showinghow Bayesian networks can describe interactions between genes. We then describe a methodfor recovering gene interactions from microarray data using tools for learning Bayesian net-works. Finally, we demonstrate this method on the S. cerevisiae cell-cycle measurements ofSpellman et al. (1998).Key words: gene expression, microarrays, Bayesian methods.1. INTRODUCTIONAcentral goal of molecular biologyis to understand the regulation of protein synthesis and itsreactions to external and internal signals. All the cells in an organism carry the same genomic data,yet their protein makeup can be drastically different both temporally and spatially, due to regulation.Protein synthesis is regulated by many mechanisms at its different stages. These include mechanisms forcontrolling transcription initiation, RNA splicing, mRNA transport, translation initiation, post-translationalmodi cations, and degradation of mRNA/protein. One of the main junctions at which regulation occursis mRNA transcription. A major role in this machinery is played by proteins themselves that bind toregulatory regions along the DNA, greatly affecting the transcription of the genes they regulate.In recent years, technical breakthroughs in spotting hybridization probes and advances in genome se-quencing efforts lead to development ofDNA microarrays, which consist of many species of probes, eitheroligonucleotides or cDNA, that are immobilized in a prede ned organization to a solid phase. By using1School of Computer Science and Engineering, Hebrew University, Jerusalem, 91904, Israel.2Institute of Life Sciences, Hebrew University, Jerusalem, 91904, Israel.3Center for Neural Computation and School of Computer Science and Engineering, Hebrew University, Jerusalem,91904, Israel.601602 FRIEDMAN ET AL.DNA microarrays, researchers are now able to measure the abundance of thousands of mRNA targetssimultaneously (DeRisiet al., 1997; Lockhartet al., 1996; Wenet al., 1998). Unlike classical experiments,where the expression levels of only a few genes were reported, DNA microarray experiments can measureallthe genes of an organism, providing a “genomic” viewpoint on gene expression. As a consequence,this technology facilitates new experimental approaches for understanding gene expression and regulation(Iyeret al., 1999; Spellmanet al., 1998).Early microarray experiments examined few samples and mainly focused on differential display acrosstissues or conditions of interest. The design of recent experiments focuses on performing a larger numberof microarray assays ranging in size from a dozen to a few hundreds of samples. In the near future, datasets containing thousands of samples will become available. Such experiments collect enormous amountsof data, which clearly re ect many aspects of the underlying biological processes. An important challengeis to develop methodologies that are both statistically sound and computationally tractable for analyzingsuch data sets and inferring biological interactions from them.Most of the analysis tools currently used are based onclusteringalgorithms. These algorithms attemptto locate groups of genes that have similar expression patterns over a set of experiments (Alonet al.,1999; Ben-Doret al., 1999; Eisenet al., 1999; Michaelset al., 1998; Spellmanet al., 1998). Such analysishas proven to be useful in discovering genes that are co-regulated and/or have similar function. A moreambitious goal for analysis is to reveal the structure of the transcriptional regulation process (Akutsu, 1998;Chenet al., 1999; Somogyiet al., 1996; Weaveret al., 1999). This is clearly a hard problem. The currentdata is extremely noisy. Moreover, mRNA expression data alone only gives a partial picture that does notre ect key events, such as translation and protein (in)activation. Finally, the amount of samples, even inthe largest experiments in the foreseeable future, does not provide enough information to construct a fullydetailed model with high statistical signi cance.In this paper, we introduce a new approach for analyzing gene expression patterns that uncovers prop-erties of the transcriptional program by examining statistical properties ofdependenceandconditionalindependencein the data. We base our approach on the well-studied statistical tool ofBayesian networks(Pearl, 1988). These networks represent the dependence structure between multiple interacting quantities(e.g., expression levels of different genes). Our approach, probabilistic in nature, is capable of handlingnoise and estimating the con dence in the different features of the network. We are, therefore, able tofocus on interactions whose signal in the data is strong.Bayesian networks are a promising tool for analyzing gene expression patterns. First, they are particularlyuseful for describing processes composed oflocallyinteracting components; that is, the value of eachcomponentdirectlydepends on the values of a relatively small number of components. Second, statisticalfoundations for learning Bayesian networks from observations, and computational algorithms to do so,are well understood and have been used successfully in many applications. Finally, Bayesian networksprovide models of causal in uence: Although Bayesian networks are mathematically de ned strictly interms of probabilities and conditional independence statements, a


View Full Document

CORNELL CS 726 - Study Notes

Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?