New version page

An Evaluation of a System that Recommends Microarray Experiments

This preview shows page 1-2-17-18-19-36-37 out of 37 pages.

View Full Document
View Full Document

End of preview. Want to read all 37 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

1 An Evaluation of a System that Recommends Microarray Experiments to Perform to Discover Gene-Regulation Pathways Changwon Yoo [email protected] / Tel: 540-231-2100 Virginia Bioinformatics Institute, Virginia Polytechnic and State University 1880 Pratt Drive, Building XV, Blacksburg, VA 24061 Gregory F. Cooper [email protected] Center for Biomedical Informatics, University of Pittsburgh 8084 Forbes Tower, 200 Lothrop St., Pittsburgh PA 15213 Abstract The main topic of this paper is modeling the expected value of experimentation for discovering causal pathways in gene expression data. By experimentation we mean both interventions (e.g., a gene knock-out experiment) and observations (e.g., passively observing the expression level of a “wild-type” gene). We introduce a system called GEEVE (causal discovery in Gene Expression data using Expected Value of Experimentation), which implements expected value of experimentation in discovering causal pathways using gene expression data. GEEVE provides the following assistance, which is intended to help biologists in their quest to discover gene-regulation pathways: • Recommending which experiments to perform (with a focus on “knock-out” experiments) using an expected value of experimentation (EVE) method. • Recommending the number of measurements (observational and experimental) to include in the experimental design, again using an EVE method. • Providing a Bayesian analysis that combines prior knowledge with the results of recent microarray experimental results to derive posterior probabilities of gene regulation relationships.2 In recommending which experiments to perform (and how many times to repeat them) the EVE approach considers the biologist’s preferences for which genes to focus the discovery process. Also, since exact EVE calculations are exponential in time, GEEVE incorporates approximation methods. GEEVE is able to combine data from knock-out experiments with data from wild-type experiments to suggest additional experiments to perform and then to analyze the results of those microarray experimental results. It models the possibility that unmeasured (latent) variables may be responsible for some of the statistical associations among the expression levels of the genes under study. To evaluate the GEEVE system, we used a gene expression simulator to generate data from specified models of gene regulation. The results show that the GEEVE system gives better results than two recently published approaches (1) in learning the generating models of gene regulation and (2) in recommending experiments to perform. Keywords: Causal discovery; Systems biology; Causal Bayesian networks; Microarray study design 1 Introduction Most research on causal discovery using causal networks has been based on using passive observational data [6, 17, 42]. There are limitations in learning causal relationships from observational data only. For example, if the generating process contains a latent factor (confounder) that influences two variables, it can be difficult, if not impossible, to learn the causal relationships between those two variables from observational data alone.3 To uncover such causal relationships, a scientist generally needs to design a study that involves manipulating a variable (or variables) and then observing the changes (if any) in other variables of interest. In such an experimental study, one or more variables are manipulated and the effects on other variables are measured. On the other hand, observational data result from passive (i.e., non-interventional) measurement of some system, such as a cell. In general, both observational and experimental data may exist on a set of variables of interest. Limited time and funds restrict the number of variables that can be manipulated and the number of experimental repeats that can be collected for the control and experimental groups. For example, a molecular biologist who is interested in discovering the causal pathway of the genes involved in galactose metabolism first has to select the genes he or she is interested in understanding at a causal level. These genes are usually selected based on previously published results or by the molecular biologist’s scientific interest. Many issues are considered in determining the number of experimental repeats to obtain for each variable in the study design. Having more experimental repeats will typically tighten the statistical confidence intervals in the data analysis. Considering available time, budget, and other constraints, the biologist will make a decision about the number of experimental repeats to obtain. Developing causal analysis methods is a key focus of several fields. In statistics, jointly with medicine, issues related to randomized clinical trials (RCTs) are studied, including methods for finding an optimal number of cases using stopping rules [3, 12, 41]. In molecular biology, developing techniques that generate efficient experimental designs for high throughput methods, such as cDNA microarrays, is gaining interest [26, 30]. In artificial intelligence, techniques using4 graphical models have been used to model experimentation and have been applied to suggest the next experiment for causal discovery [23, 27, 45]. All these prior approaches have made contributions to efficient causal study design (see Section 2 for details). They are not, however, sensitive to issues of limited resources and experimenter preferences. The research reported here is concerned with developing and evaluating a decision-analytic system that considers these issues in helping a biologist design and analyze studies of cellular pathways using high throughput sources of data. In particular, this paper concentrates on the design and analysis of cDNA microarray studies for uncovering gene regulation pathways. The fundamental methodology, however, is applicable to analyzing other high throughput data sources, such as the measurement of protein-levels, which is a rapidly developing area of biology. The GEEVE (causal discovery in Gene Expression data using Expected Value of Experimentation) system uses ideas from different areas of study. GEEVE uses causal Bayesian networks (see Section 2.1) and incorporates an experimenter’s preference (see Section 2.2) to give recommendations to the experimenter about designing a gene expression experimental study (see Section 2.3). In the remainder of this section, we provide


Loading Unlocking...
Login

Join to view An Evaluation of a System that Recommends Microarray Experiments and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view An Evaluation of a System that Recommends Microarray Experiments and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?