DOC PREVIEW
MODELING CELLULAR PROCESSES WITH VARIATIONAL BAYESIAN COOPERATIVE VECTOR QUANTIZER

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

MODELING CELLULAR PROCESSES WITH VARIATIONAL BAYESIANCOOPERATIVE VECTOR QUANTIZERX. LU1,2,4, M. HAUSKRECHT2and R.S. DAY31Center for Biomedical Informatics,2Dept of Computer Science,3Dept of Biostatistics. University of Pittsburgh4Dept of Biometry and Epidemiology, Medical University of South Carolinaemail: [email protected], [email protected], [email protected] expression of a cell is controlled by sophisticated cellular processes.The capability of inferring the states of these cellular processes would provideinsight into the mechanism of gene expression control system. In this paper,we propose and investigate the cooperative vector quantizer (CVQ) model foranalysis of microarray data. The CVQ model could be capable of decomposingobserved microarray data into many different regulatory subprocesses. To makethe CVQ analysis tractable we develop and apply variational approximations.Bayesian model selection is employed in the model, so that the optimal num-ber processes is determined purely from observed micro-array data. We test themodel and algorithms on two datasets: (1) simulated gene-expression data and(2) real-world yeast cell-cycle microarray data. The results illustrate the abil-ity of the CVQ approach to recover and characterize regulatory gene expressionsubprocesses, indicating a potential for advanced gene expression data analysis.1 IntroductionCurrent DNA microarray technology allows scientists to monitor gene expression atgenome level. Although microarray data are not direct measurements of activity ofcellular processes (or signal transduction pathways), they provide opportunities to in-fer the states of the cellular processes and study the mechanism of gene expressioncontrol at the system level. When a cell is subjected to different conditions, thestates of the processes controlling gene expression change accordingly and result indifferent gene expression patterns. One important task for system biologists is toidentify the cellular processes controlling gene expression and infer their states un-der a specific condition based on observed expression patterns. Different approacheshave been applied in order to identify the cellular processes by decomposing (de-convoluting) the observed microarray data into different components. For example,singular value decomposition (SVD)1, principal component analysis (PCA)2, inde-pendent component analysis (ICA)3,4, Bayesian decomposition5and probabilisticaTo whom correspondence should be addressed.relation modeling (PRM)6have been used to decompose observed microarray datainto different processes.The problem of identifying hidden regulatory processes in a cell can be formu-lated as a blind source separation problem, where distinct regulatory processes, whichwe would like to identify and characterize, are modeled as hidden sourcesb. The taskis to identify the source signals purely based on observed data. An additional chal-lenge is that the separation process must be performed fully unsupervised - the numberof sources is not known in advance.To facilitate biological interpretation, the originating signals of the processes ina system should be identified uniquely. Some of the aforementioned models, such asSVD and PCA, restrict the components to be orthonormal, thus they are not suitablefor blind source separation. Independent component analysis (ICA), independent fac-tor analysis (IFA) and various vector quantization models7,8,9,10are among the mod-els used for blind source separation. In this work we develop an inference algorithmfor one such model – the cooperative vector quantizer (CVQ) model. The main ad-vantage of the CVQ model over other blind source separation models is that it mimicsthe switching-state nature of the regulatory processes; consequently, the results of theanalysis can be easily interpreted by biologists.Fully unsupervised blind source separation requires learning the model structure.In microarray data analysis, one needs to infer the optimal number of latent regula-tory processes in the system. The parameters of a latent variable model with a fixedstructure (known number of processes) can be learned using maximum likelihood es-timation (MLE) techniques, e.g. the expectation maximization (EM)11algorithm, asin Segal et al6. Unfortunately, the value of likelihood by itself is not suitable formodel selection. The main reason is that MLE prefers more complex models andtends to over-fit the training data. That is, more complex models return higher likeli-hood scores for the training data, but they do not generalize well to future, yet to beseen, data. On the other hand, the methods used in the studies by Alter et al1andLiebermeister3simply dictate the number of processes of the model and do not havethe flexibility of model selection. Model selection can be addressed effectively withinthe Bayesian framework12,13,14. Bayesian selection penalizes models for complexityas well as for poor fit, therefore it implements Occam’s Razor. In this work, we in-vestigate the Bayesian model selection framework in the context of the CVQ model.More specifically, we derive and implement a variational Bayesian approach whichcan automatically learn both the structure and parameters of the CVQ model, and thusperform full-scale blind source separation.In the following sections, we first present the CVQ model. After that, we discussthe theory of the Bayesian model selection and its approximations. We derive andpresent a variational Bayesian approximation for learning the CVQ model from data.bWe use “sources” and “processes” interchangeably throughout the rest of paper.S1 S2......γySkWn=1,2,...,Nπ2τπ1 πkFigure 1: A directed acyclic graph (DAG) representation of the cooperative vector quantizer (CVQ) model.The square corresponds to an individual data point which consists of observed variables y and latent vari-ables s. W, γ, τ and π are model parameters.Finally, we test the model and algorithms on (1) a simulated gene expression data (2)yeast cell-cycle microarray data20and discuss the results.2 The CVQ ModelIn the CVQ model, the states of the cellular processes are represented as a set of binaryvariables s = {sk}Kk=1referred to as sources, where K is the number of processes ina given model. Each source assumes a value of 0/1, which simulate the “off/on” stateof cellular processes. Each microarray experiment is represented as a D-dimensionalvector y, where D is the number of genes on a microarray. An observed data


MODELING CELLULAR PROCESSES WITH VARIATIONAL BAYESIAN COOPERATIVE VECTOR QUANTIZER

Download MODELING CELLULAR PROCESSES WITH VARIATIONAL BAYESIAN COOPERATIVE VECTOR QUANTIZER
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view MODELING CELLULAR PROCESSES WITH VARIATIONAL BAYESIAN COOPERATIVE VECTOR QUANTIZER and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MODELING CELLULAR PROCESSES WITH VARIATIONAL BAYESIAN COOPERATIVE VECTOR QUANTIZER 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?