Multivariate Gaussian DistributionLeon GuCSD, CMUMultivariate Gaussianp(x|µ, Σ) =1(2π)n/2|Σ|1/2exp {−12(x − µ)TΣ−1(x − µ)}IMoment Parameterization: µ = E(X),Σ = Cov(X) = E[(X − µ)(X − µ)T] (symmetric, positivesemi-definite matrix).IMahalanobis distance: 42= (x − µ)TΣ−1(x − µ).ICanonical Parameterization:p(x|η, Λ) = exp {a + ηTx −12xTΛx}where Λ = Σ−1, η = Σ−1µ, a = −12n log 2π − log |Λ| + ηTΛη.ITons of applications (MoG, FA, PPCA, Kalman Filter, ...)Multivariate Gaussian P (X1, X2)P (X1, X2) (Joint Gaussian)µ =µ1µ2, Σ =Σ11Σ12Σ21Σ22P (X2) (Marginal Gaussian)µm2= µ2, Σm2= Σ2P (X1|X2= x2) (Conditional Gaussian)µ1|2= µ1+ Σ12Σ−122(x2− µ2)Σ1|2= Σ11− Σ12Σ−122Σ21Operations on Gaussian R.V.The linear transform of a gaussian r.v. is a guassian. Remember that nomatter how x is distributed,E(AX + b) = AE(X) + bCov(AX + b) = ACov(X)ATthis means that for gaussian distributed quantities:X ∼ N (µ, Σ) ⇒ AX + b ∼ N (Aµ + b, AΣAT).The sum of two independent gaussian r.v. is a gaussian.Y = X1+ X2, X1⊥ X2⇒ µY= µ1+ µ2, ΣY= Σ1+ Σ2The multiplication of two gaussian functions is another gaussian function(although no longer normalized).N (a, A)N (b, B) ∝ N (c, C),where C = (A−1+ B−1)−1, c = CA−1a + CB−1bMaximum Likelihood Estimate of µ and ΣGiven a set of i.i.d. data X = {x1, . . . , xN} drawn from N (x; µ, Σ), wewant to estimate (µ, Σ) by MLE. The log-likelihood function isln p(X|µ, Σ) = −N2ln |Σ| −12NXn=1(xn− µ)TΣ−1(xn− µ) + constTaking its derivative w.r.t. µ and setting it to zero we haveˆµ =1NNXn=1xnRewrite the log-likelihood using “trace trick”,ln p(X|µ, Σ) = −N2ln |Σ| −12NPn=1(xn− µ)TΣ−1(xn− µ) + const∝ −N2ln |Σ| −12NPn=1Trace“Σ−1(xn− µ)(xn− µ)T”= −N2ln |Σ| −12Trace Σ−1NPn=1[(xn− µ)(xn− µ)T]!Taking the derivative w.r.t. Σ−1, and using 1)∂∂Alog |A| = A−T; 2)∂∂ATr[AB] =∂∂ATr[BA] = BT, we obtainˆΣ =1NNXn=1(xn− ˆµ) (xn−
View Full Document