0 0 9 views

**Unformatted text preview:**

Journal of Computational Physics 212 2006 6 24 www elsevier com locate jcp Accuracy limitations and the measurement of errors in the stochastic simulation of chemically reacting systems Yang Cao Linda Petzold 1 Department of Computer Science University of California Santa Barbara CA United States Received 11 November 2004 received in revised form 9 May 2005 accepted 21 June 2005 Available online 18 August 2005 Abstract This paper introduces the concept of distribution distance for the measurement of errors in exact and approximate methods for stochastic simulation of chemically reacting systems Two types of distance are discussed the Kolmogorov distance and the histogram distance The self distance an important property of Monte Carlo methods that quanti es the accuracy limitation at a given resolution for a given number of realizations is de ned and studied Estimation formulas are established for the histogram and the Kolmogorov self distance These formulas do not depend on the distribution of the samples and thus show a property of the Monte Carlo method itself Numerical results demonstrate that the formulas are very accurate Application of these results to two problems of current interest in the simulation of biochemical systems is discussed 2005 Elsevier Inc All rights reserved 1 Introduction In microscopic systems formed by living cells the small numbers of reactant molecules can result in dynamical behavior that is discrete and stochastic rather than continuous and deterministic 1 9 The stochasticity often called biochemical noise by biologists in microscopic systems has been implicated in the lysis lysogeny decision of the bacteria k phage 3 and the loss of synchrony of Circadian clocks 4 To study the stochasticity in microscopic systems engineered gene circuits have been designed and implemented in the laboratory The e ects of stochasticity have been observed in biological experiments 6 9 Tel 1 805 893 5362 fax 1 805 893 5435 E mail address petzold engineering ucsb edu L Petzold 1 Department of Mechanical and Environmental Engineering University of Santa Barbara Santa Barbara CA 93106 5070 United States 0021 9991 see front matter 2005 Elsevier Inc All rights reserved doi 10 1016 j jcp 2005 06 012 Y Cao L Petzold Journal of Computational Physics 212 2006 6 24 7 Detailed models 1 2 10 11 for the expression of a single gene and gene networks have been proposed to explain these experiments The comparison between models and experiments is veri ed through Monte Carlo simulations Such simulations are based on Gillespie s stochastic simulation algorithm SSA 12 13 which yields an exact stochastic simulation for well stirred chemically reacting systems However it can be prohibitively expensive for realistic biochemical simulation Approximate simulation methods have been proposed such as the s leaping method 14 implicit s method 15 and hybrid methods 16 17 These methods can achieve greater e ciency and give a close approximation to the SSA method Two natural questions are concerned here One is How should we measure the di erence between the experiments and the Monte Carlo simulations The other is How should we measure the accuracy of approximate methods We seek a quantitative measurement Typically what is of interest from an experiment or a simulation are the stochastic properties of the solution variables or of some function of the solution values as opposed to the values from one simulation One possibility for measuring the error is to compute the errors in solution moments such as the mean and variance However often the problems for which stochastic simulation makes a big di erence have a bistable distribution A simple example of this type is given by the Schlo gl 18 reaction Some well known problems from biology 3 7 19 also have this property For such problems the low order moments such as mean and variance do not have much relevance Rather we need to know how well the detailed model captures the probability distribution or how well approximate methods capture the exact probability distributions for the variables and properties of interest To describe this error in this paper we adopt the concept of distribution distance Two types of distribution distance are considered here One is the Kolmogorov distance 20 21 de ned to measure the distance between cumulative distribution functions cdf The other is the density distance area de ned as the L1 distance between the probability density functions pdf With the distribution distance we can measure the accuracy of the distributions given by di erent approximation formulas There are very few systems for which we can analytically solve for the distribution Thus for most problems we must collect a large number of samples by experiments or by simulation methods such as SSA The distribution is then estimated by the empirical distribution function 22 edf or the histogram of the samples For a su ciently large number of samples the edf is close to the cdf while the histogram is close to the pdf However in both experiment and computation the number of samples is always limited Thus such a process is subject to an inherent error due to the randomness of the variables of interest which is traditionally called statistical uctuation in the literature of Monte Carlo simulations In practice we can measure this statistical uctuation by the distance between two sets of independent samples with the same distribution We call it self distance This concept is similar to that of the round o error in classical numerical analysis In that context due to the limited length of the storage for each variable there is a round o error In stochastic simulation a simple limitation also exists on the number of samples For example if there are two sets of samples for which the distribution distance between them is smaller than their self distance we cannot tell whether or not these two sets of samples represent di erent distributions unless we increase the number of samples the analogous solution in the situation of round o error is to increase the resolution of the oating point number representation In numerical analysis the round o error is very small 2 22 10 16 in Matlab Thus in many cases we do not need to worry about it But in stochastic simulation the self distance is usually much larger so we must be aware of it For example in numerical studies of the convergence of s leaping methods 23 we have observed that regardless of how small the stepsize is we cannot