Unformatted text preview:

MIT OpenCourseWare http://ocw.mit.edu 18.443 Statistics for Applications Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.� � � 18.443 Bayes least-squares estimat ion In the lecture on Monday the 9th I g ave material on Bayes estimation that I haven’t found in Rice, so here is a printed form of it. First here is a very simple fact. Proposition 1. For any random variable X with E(X2) < +∞, the unique constant c that minimizes E((X − c)2) is c = EX. Proof. E((X − c)2) = E(X2) − 2cEX + c2 is a quadratic polynomial in c which goes to +∞ as c → ±∞, so it’s minimized where its derivative with respect to c equal s 0, namely at c = EX, Q.E.D. Now suppose given a prior density π(θ) for a continuous parameter θ, so π(θ ) > 0 and ∞ π( θ)dθ = 1. (If θ is m-mult idimensional we would need an m-fold multiple int egra l ∞ but nothing essential in the following would be changed, so I’ll write a s if m = 1.) Let f(x, θ) be a likeliho od function for one observation, which may be either a probability mass function if x is discrete or a density function if x is continuous. If we have i.i.d. observations X = (X1, ..., Xn) we get a likeli hood function f(X, θ) = �n f(Xj, θ). The j=1 posterior density πX(θ) is gotten by multiplying the likelihood function by the prior and then normalizing it, f(X, θ)π(θ)πX(θ) = � ∞ . f(X, φ)π(φ)dφ −∞ Suppose we want to estimate a function g(θ). Then for an estimator V (X), the mean-square error (MSE) is Eθ[(V (X) − g(θ))2]. For a prior π, the risk is the expectation of the MSE wi th respect to that prio r, namely ∞ (1) Eθ[(V (X) − g(θ))2]π(θ)dθ. −∞ A Bayes estimator for g(θ) for the given prior is one that minimizes the risk, provided it s risk is finite. Theorem 2. For a given likeliho od function f(X, θ) and prior density π, if there exists some estimator U (X) of the given g( θ) that has finite risk for t he given π, then there exists a Bayes estimator, given by the expectation of g(θ) with respect to the posterior distribution, ∞ (2) T (X) = g(θ)πX(θ)dθ. −∞ The Bayes estima tor is essenti ally unique, in the sense that any Bayes estimator must equal this T (X) with probability 1. 1� � Proof. We would like to minimize (1). Let’s write out the Eθ. I’ll use � · · · dX as a shorthand for � � �∞ ∞ ∞ · · · · · · dX1 dX2 · · · dXn, −∞ −∞ −∞ where the integral(s) are replaced by sums in case X is discrete. The method of proof is essentially the same. Then (1) becomes ∞ (3) [(V (X) − g( θ))2]f (X, θ) dX π( θ)dθ. −∞ Since the integrand is nonnegative and the integrals are well-defined (possibly infinite) we can i nterchange the two integrals, and (3) becomes � � ∞ (4) [(V (X) − g( θ))2]f (X, θ)π(θ)dθdX. −∞ ∞We can further write f (X, θ)π(θ) = πX(θ)� f(X, φ)dφ by definition of the posterior −∞ density πX, and the latter factor doesn’t depend on θ so we can t ake it outside the integral with respect to θ. Then (4) becomes � � �∞ ∞ (5) [(V (X) − g( θ))2]πX(θ)dθ f (X, φ)dφdX. −∞ −∞ In the inner integral with respect to θ in (5), X is fixed and g(θ) is a random variable with respect to the posterior density πX(θ). To minimize this inner i ntegral we need to choose V (X) , which would be constant for fixed X. By Proposition 1, the correct constant is given by V (X) = T (X) in (2). Since the ri sk is finite for some estimator by assumption, the minimum risk must be finite, so T (X) in (2) indeed gives a Bayes estimator. The essential uniqueness foll ows from the uniqueness in Proposition 1. Q.E.D. As mentioned in the lecture 3/9, if θ = λ is a Poisson parameter and the prior π is a gamma density, then the posterior is also in the gamma family and the expectation for it is known. The same occurred using a beta prior for t he binomial p in a problem set problem. Some texts give a different formulati on of Theorem 2 in which they say that t he Bayes estimator is the conditional expectation of g(θ) given X, T (X) = E(g( θ)|X). That is cor-rect in case � |g(θ)|π(θ)dθ < +∞ but integrals with respect to the posterior distributions may be finite even if they are not with resp ect to the prior. There is more about that in Section 2.6 of the 18.466 OCW notes, but that section is far from self-contained, so otherwise it isn’t recommended reading in this course. REFERENCES Dudley, R. M. (2003). Mathematical Statistics, 18.466 lecture notes, Spring 2003. On MIT OCW (OpenCourseWare) website, 2004.


View Full Document
Download MIT18_443s09_assn04_res01
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view MIT18_443s09_assn04_res01 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MIT18_443s09_assn04_res01 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?