DOC PREVIEW
Stanford CS 374 - Probabilistic Discovery of Overlapping Cellular Processes and Their Regulation

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Probabilistic Discovery of Overlapping Cellular Processesand Their RegulationAlexis [email protected] [email protected] [email protected] Science DepartmentStanford UniversityStanford, CA 94305-9010ABSTRACTMany of the functions carried out by a living cell are reg-ulated at the transcriptional level, to ensure that genes areexpressed when they are needed. Thus, to understand bi-ological processes, it is thus necessary to understand thecell’s transcriptional network. In this paper, we propose anovel probabilistic model of gene regulation for the task ofidentifying overlapping biological processes and the regula-tory mechanism controlling their activation. A key featureof our approach is that we allow genes to participate in mul-tiple processes, thus providing a more biologically plausiblemodel for the process of gene regulation. We present analgorithm to learn this model automatically from data, us-ing only genome-wide measurements of gene expression asinput. We compare our results to those obtained by otherapproaches, and show significant benefits can be gained bymodeling both the organization of genes into overlappingcellular processes and the regulatory programs of these pro-cesses. Moreover, our method successfully grouped genesknown to function together, recovered many regulatory re-lationships that are known in the literature, and suggestednovel hypotheses regarding the regulatory role of previouslyuncharacterized proteins.Categories and Subject DescriptorsJ.3 [Life and Medical Sciences]: Biology and geneticsGeneral TermsExperimentation, AlgorithmsKeywordsCellular Processes, Gene Regulation, Probabilistic RelationalModelsPermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.RECOMB’04, March 27–31, 2004, San Diego, California, USA.Copyright 2004 ACM 1-58113-755-9/04/0003 ...$5.00.1. IntroductionA living cell is a complex system that needs to performvarious functions and adapt to changing environments. Es-sential to the cell’s ability to respond to different circum-stances is its organization into a set of processes,whoseac-tivity varies according to circumstance. The activation ofthese processes is often controlled by a variety of regulatorysignals, which are themselves triggered by various aspectsof the current state. Genome-wide measurements of mRNAexpression level across multiple experimental conditions pro-vide us with a global picture of the cell’s activities, and openthe way to a high-level understanding of its behavior.The most common approach for analyzing gene expressionis to cluster genes whose expression profile is similar over arange of experimental conditions [7, 5]. As the clusters cor-respond to sets of genes that respond similarly in differentcircumstances, their member genes are likely to share a com-mon function. However, these approaches group genes intomutually exclusive clusters, whereas the real biological sys-tem is much more intricate, with many genes playing mul-tiple roles in different circumstances. To address this, otherapproaches [29, 31, 15, 19, 1, 26] have been proposed for dis-covering processes, or overlapping groups of genes, therebyproviding a more realistic model for cellular activity.However, finding the genes that participate in each pro-cess provides only a partial view of the biological system.In many cases, we are also interested in discovering theregulatory mechanism that controls the activation of eachprocess and its response to different circumstances. Thistype of analysis is typically done as a post-processing step,by searching, for example, for common cis-regulatory mo-tifs in the promoter region of the genes in the process [32],or (more recently) for correlation with protein-DNA bind-ing data [20]. However, such approaches are limited bothby the amount of noise in these data, and, more impor-tantly, by the fact that many regulatory relationships arecontext-specific, occurring only under certain conditions, aphenomenon which does not manifest in these data.Recent work [24, 28] shows that regulatory relationshipscan be learned directly from gene expression data, in a waythat accounts for the context-specificity of regulatory events.Moreover, as shown by Segal et al.[28, 27], by searchingfor sets of co-regulated genes and their actual regulatoryprogram simultaneously, a much more accurate organizationof genes into processes can be obtained. However, while theirapproach models aspects relating to regulatory mechanisms,167they partition the genes into mutually exclusive processes,and thus their models are limited in the comprehensivenessof the regulatory picture they provide.In this paper, we propose a novel model of gene regula-tion for discovering overlapping cellular processes and theirregulatory programs — the Coregulated Overlapping Pro-cesses model, or COPR model (pronounced “copper”). Ourapproach builds on the ideas in our earlier work [26] for mod-eling overlapping processes, and extends the ideas of Segal etal. [28] for modeling regulatory programs, resulting in a uni-fied probabilistic framework of gene regulation. The COPRmodel makes explicit the notion of a biological process,andeach gene can then be a member of one or more of theseprocesses.Following our earlier work, we assume that a process is ac-tive to different extents in different conditions. Specifically,we assume that each process p has some activity level a.Cpin each given array a. The expression level of a gene in anarray a is therefore a sum of the activities of the processeswith which it is associated.For modeling regulatory relationships, we make the sim-plifying assumption that regulation is done at the level ofprocesses, rather than individual genes. Thus, we assumethat each process is associated with some (unknown) regu-latory program, which determines its activity level a.Cpineach array a. Thus, the activity of p is different in differ-ent arrays, but it is governed by the same set of rules. Theprocess’ activity level is assumed to be a function of its regu-lators. We use a regulatory program that captures two of themost


View Full Document

Stanford CS 374 - Probabilistic Discovery of Overlapping Cellular Processes and Their Regulation

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download Probabilistic Discovery of Overlapping Cellular Processes and Their Regulation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Probabilistic Discovery of Overlapping Cellular Processes and Their Regulation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Probabilistic Discovery of Overlapping Cellular Processes and Their Regulation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?