**Unformatted text preview:**

University of Colorado at Denver and Health Sciences CenterA Brief Tutorial on the Ensemble KalmanFilterJan MandelFebruary 2007 UCDHSC/CCM Report No. 242CENTER FOR COMPUTATIONAL MATHEMATICS REPORTSA BRIEF TUTORIAL ON THE ENSEMBLE KALMAN FILTER∗JAN MANDEL†Abstract. The ensemble Kalman filter (EnKF) is a recursive filter suitable for problems witha large numbe r of variables, such as discretizations of partial differential equations in geophysicalmodels. The EnKF originated as a version of the Kalman filter for large problems (essentially,the covariance matrix is replaced by the sample covariance), and it is now an important dataassimilation component of ensemble forecasting. EnKF is related to the particle filter (in this context,a particle is the same thing as an ensemble member) but the EnKF makes the assumption that allprobability distributions involved are Gaussian. This article briefly describes the derivation andpractical implementation of the basic version of EnKF, and reviews several extensions.February 20071. Introduction. The Ensemble Kalman Filter (EnKF) is a Monte-Carloimplementation of the Bayesian update problem: Given a probability density function(pdf) of the state of the modeled system (the prior, called often the forecast ingeosciences) and the data likelihood, the Bayes theorem is used to to obtain pdfafter the data likelihood has beed taken into account (the posterior, often called theanalysis). This is called a Bayesian update. The Bayesian update is combined withadvancing the model in time, incorporating new data from time to time. The originalKalman Filter [17] assumes that all pdfs are Gaussian (the Gaussian assumption)and provides algebraic formulas for the change of the mean and covariance by theBayesian update, as well as a formula for advancing the covariance matrix in timeprovided the system is linear. However, maintaining the covariance matrix is notfeasible computationally for high-dimensional systems. For this reason, EnKFs weredeveloped [9, 15]. EnKFs represent the distribution of the system state using arandom sample, called an ensemble, and replace the covariance matrix by the samplecovariance computed from the ensemble. One advantage of EnKFs is that advancingthe pdf in time is achieved by simply advancing each member of the ensemble. For asurvey of EnKF and related data assimilation techniques, see [12].2. A derivation of the EnKF.2.1. The Kalman Filter. Let us review first the Kalman filter. Let x denotethe n-dimensional state vector of a model, and assume that it has Gaussian probabilitydistribution with mean µ and covariance Q, i.e., its pdf isp(x) ∝ exp−12(x − µ)TQ−1(x − µ).Here and below, ∝ means proportional; a pdf is always scaled so that its integral overthe whole space is one. This probability distribution, called the prior, was evolved intime by running the mo del and now is to be updated to account for new data. It isnatural to assume that the error distribution of the data is known; data have to come∗This document is not copyrighted and its use is governed by the GNU Free DocumentationLicense, available at http://www.gnu.org/copyleft/fdl.html. The LATEX source of this document isavailable at http://www.math.cudenver.edu/˜jmandel/papers/enkf tut orial . The Wikipedia article“Ensemble Kalman Filter” at http://en.wikipedia.org/ wiki/Ense mble Kalman filter as of 06:27, 23February 2007 (UTC) was created by translating this document from LATEX to Wiki. This work hasbeen supported by the National Science Foundati on under the grant CNS-0325314.†Center for Computational Mathematics, University of Colorado at Denve r and Health SciencesCenter, Denver, CO 80217-33641with an error estimate, otherwise they are meaningless. Here, the data d is assumedto have Gaussian pdf with covariance R and mean Hx, where H is the so-called theobservation matrix. The covariance matrix R describes the estimate of the error ofthe data; if the random errors in the entries of the data vector d are indep e ndent, Ris diagonal and its diagonal entries are the squares of the standard deviation (“errorsize”) of the error of the corresponding entries of the data vector d. The value Hxis what the value of the data would be for the state x in the absence of data errors.Then the probability density p(d|x) of the the data d conditional of the system statex, called the data likelihood, isp (d|x) ∝ exp−12(d − Hx)TR−1(d − Hx).The pdf of the state and the data likelihood are combined to give the newprobability density of the system state x conditional on the value of the data d (theposterior ) by the Bayes theorem,p (x|d) ∝ p (y|d) p(x).The data d is fixed once it is received, so denote the posterior state by ˆx instead ofx|d and the posterior pdf by p (ˆx). It can be shown by algebraic manipulations [1]that the posterior pdf is also Gaussian,p (ˆx) ∝ exp−12(ˆx − ˆµ)TP−1(ˆx − ˆµ),with the posterior mean ˆµ and covarianceˆQ given by the Kalman update formulasˆµ = µ + K (d − Hµ) ,ˆQ = (I − KH) Q,whereK = QHTHQHT+ R−1is the so-called Kalman gain matrix.2.2. The Ensemble Kalman Filter. The EnKF is a Monte Carloapproximation of the Kalman filter, which avoids evolving the covariance matrix ofthe pdf of the state vector x. Instead, the distribution is re presented by a sample,called an ensemble. So, letX = [x1, . . . , xN] = [xi]be an n × N matrix whose columns are a sample from the prior distribution. Thematrix X is called the prior ensemble. Replicate the data d into an m × N matrixD = [d1, . . . , dN] = [di]so that each column diconsists of the data vector d plus a random vector from then-dimensional normal distribution N(0, R). Then the columns ofˆX = X + K(D − HX)form a random sample from the posterior distribution. The EnKF is now obtained[16] simply by replacing the state covariance Q in Kalman gain matrix K =QHTHQHT+ R−1by the sample covariance C computed from the ensemblemembers (called the ensemble covariance).23. Implementation.3.1. Basic formulation. Here we follow [7, 10, 18]. Suppose the ensemblematrix X and the data matrix D are as above. The ensemble mean and the covarianceareE (X) =1NNXk=1xk, C =AATN − 1,whereA = X − E (X) = X −1N(XeN×1) e1×N,and e denotes the matrix of all ones of the indicated size.The posterior ensemble Xpis then given byˆX ≈ Xp= X + CHTHCHT+ R−1(D − HX),where the perturbed data matrix D is as above. It can be

View Full Document