EE363 Winter 2008-09Lecture 7Estimation• Gaussian random vectors• minimum mean-square estimation (MMSE)• MMSE with linear measurements• relation to least-squares, pseudo-inverse7–1Gaussian random vectorsrandom vector x ∈ Rnis Gaussian if it has densitypx(v) = (2π)−n/2(det Σ)−1/2exp−12(v − ¯x)TΣ−1(v − ¯x),for some Σ = ΣT> 0, ¯x ∈ Rn• denoted x ∼ N(¯x, Σ)• ¯x ∈ Rnis the mean or expected value of x, i.e.,¯x = E x =Zvpx(v)dv• Σ = ΣT> 0 is the covariance matrix of x, i.e.,Σ = E(x − ¯x)(x − ¯x)TEstimation 7–2= E xxT− ¯x¯xT=Z(v − ¯x)(v − ¯x)Tpx(v)dvdensity for x ∼ N(0, 1):−4 −3 −2 −1 0 1 2 3 400.050.10.150.20.250.30.350.40.45vpx(v) =1√2πe−v2/2Estimation 7–3• mean and variance of scalar random variable xiareE xi= ¯xi, E(xi− ¯xi)2= Σiihence standard deviation of xiis√Σii• covariance between xiand xjis E(xi− ¯xi)(xj− ¯xj) = Σij• correlation coefficient between xiand xjis ρij=ΣijpΣiiΣjj• mean (norm) square deviation of x from ¯x isE kx − ¯xk2= E Tr(x − ¯x)(x − ¯x)T= Tr Σ =nXi=1Σii(using Tr AB = Tr BA)example: x ∼ N(0, I) means xiare independent identically distributed(IID) N(0, 1) random variablesEstimation 7–4Confidence ellipsoids• px(v) is constant for (v − ¯x)TΣ−1(v − ¯x) = α, i.e., on the surface ofellipsoidEα= {v | (v − ¯x)TΣ−1(v − ¯x) ≤ α}– thus ¯x and Σ determine shape of density• η-confidence set for random variable z is smallest volume set S withProb(z ∈ S) ≥ η– in general case confidence set has form {v | pz(v) ≥ β}• Eαare the η-confidence sets for Gaussian, called confidence ellipsoids– α determines confidence level ηEstimation 7–5Confidence levelsthe nonnegative random variable (x − ¯x)TΣ−1(x − ¯x) has a χ2ndistribution, so Prob(x ∈ Eα) = Fχ2n(α) where Fχ2nis the CDFsome good approximations:• Engives about 50% probability• En+2√ngives about 90% probabilityEstimation 7–6geometrically:• mean ¯x gives center of ellipsoid• semiaxes are√αλiui, where uiare (orthonormal) eigenvectors of Σwith eigenvalues λiEstimation 7–7example: x ∼ N(¯x, Σ) with ¯x =21, Σ =2 11 1• x1has mean 2, std. dev.√2• x2has mean 1, std. dev. 1• correlation coefficient between x1and x2is ρ = 1/√2• E kx − ¯xk2= 390% confidence ellipsoid corresponds to α = 4.6:−10 −8 −6 −4 −2 0 2 4 6 8 10−8−6−4−202468x1x2(here, 91 out of 100 fall in E4.6)Estimation 7–8Affine transformationsuppose x ∼ N(¯x, Σx)consider affine transformation of x:z = Ax + b,where A ∈ Rm×n, b ∈ Rmthen z is Gaussian, with meanE z = E(Ax + b) = A E x + b = A¯x + band covarianceΣz= E(z − ¯z)(z − ¯z)T= E A(x − ¯x)(x − ¯x)TAT= AΣxATEstimation 7–9examples:• if w ∼ N(0, I) then x = Σ1/2w + ¯x is N(¯x, Σ)useful for simulating vectors with given mean and covariance• conversely, if x ∼ N(¯x, Σ) then z = Σ−1/2(x − ¯x) is N(0, I)(normalizes & decorrelates; called whitening or normalizing)Estimation 7–10suppose x ∼ N(¯x, Σ) and c ∈ Rnscalar cTx has mean cT¯x and variance cTΣcthus (unit length) direction of minimum variability for x is u, whereΣu = λminu, kuk = 1standard deviation of uTnx is√λmin(similarly for maximum variability)Estimation 7–11Degenerate Gaussian vectors• it is convenient to allow Σ to be singular (but still Σ = ΣT≥ 0)– in this case density formula obviously does not hold– meaning: in some directions x is not random at all– random variable x is called a degenerate Gaussian• write Σ asΣ =Q+Q0 Σ+00 0Q+Q0 Twhere Q = [Q+Q0] is orthogonal, Σ+> 0– columns of Q0are orthonormal basis for N(Σ)– columns of Q+are orthonormal basis for range(Σ)Estimation 7–12• thenQTx =zw, x = Q+z + Q0w– z ∼ N(QT+¯x, Σ+) is (nondegenerate) Gaussian (hence, densityformula holds)– w = QT0¯x ∈ Rnis not random, called deterministic component of xEstimation 7–13Linear measurementslinear measurements with noise:y = Ax + v• x ∈ Rnis what we want to measure or estimate• y ∈ Rmis measurement• A ∈ Rm×ncharacterizes sensors or measurements• v is sensor noiseEstimation 7–14common assumptions:• x ∼ N(¯x, Σx)• v ∼ N(¯v, Σv)• x and v are independent• N(¯x, Σx) is the prior distribution of x (describes initial uncertaintyabout x)• ¯v is noise bias or offset (and is usually 0)• Σvis noise covarianceEstimation 7–15thusxv∼ N¯x¯v,Σx00 Σvusingxy=I 0A Ixvwe can writeExy=¯xA¯x + ¯vandEx − ¯xy − ¯yx − ¯xy − ¯yT=I 0A IΣx00 ΣvI 0A IT=ΣxΣxATAΣxAΣxAT+ ΣvEstimation 7–16covariance of measurement y is AΣxAT+ Σv• AΣxATis ‘signal covariance’• Σvis ‘noise covariance’Estimation 7–17Minimum mean-square estimationsuppose x ∈ Rnand y ∈ Rmare random vectors (not necessarily Gaussian)we seek to estimate x given ythus we seek a function φ : Rm→ Rnsuch that ˆx = φ(y) is near xone common measure of nearness: mean-square error,E kφ(y) − xk2minimum mean-square estimator (MMSE) φmmseminimizes this quantitygeneral solution: φmmse(y) = E(x|y), i.e., the conditional expectation of xgiven yEstimation 7–18MMSE for Gaussian vectorsnow suppose x ∈ Rnand y ∈ Rmare jointly Gaussian:xy∼ N ¯x¯y,ΣxΣxyΣTxyΣy (after a lot of algebra) the conditional density ispx|y(v|y) = (2π)−n/2(det Λ)−1/2exp−12(v − w)TΛ−1(v − w),whereΛ = Σx− ΣxyΣ−1yΣTxy, w = ¯x + ΣxyΣ−1y(y − ¯y)hence MMSE estimator (i.e., conditional expectation) isˆx = φmmse(y) = E(x|y) = ¯x + ΣxyΣ−1y(y − ¯y)Estimation 7–19φmmseis an affine functionMMSE estimation error, ˆx − x, is a Gaussian random vectorˆx − x ∼ N(0, Σx− ΣxyΣ−1yΣTxy)note thatΣx− ΣxyΣ−1yΣTxy≤ Σxi.e., covariance of estimation error is always less than prior covariance of xEstimation 7–20Best linear unbiased estimatorestimatorˆx = φblu(y) = ¯x + ΣxyΣ−1y(y − ¯y)makes sense when x, y aren’t jointly Gaussianthis estimator• is unbiased, i.e., E ˆx = E x• often works well• is widely used• has minimum mean square error among all affine estimatorssometimes called best linear unbiased estimatorEstimation 7–21MMSE with linear measurementsconsider specific casey = Ax + v, x ∼ N(¯x, Σx), v ∼
View Full Document