Unformatted text preview:

Economics 520, Fall 2011Lecture Note 13: Bayesian Point Estimation (CB 7.2.3)We have already discussed two general approaches to constructing point estimators of parame-ters, the method of moments and the maximum likelihood method. A third important class ofestimators are the Bayesian estimators, so called because they make use of Bayes’ Theorem.Suppose we are interested in the probability that a coin comes up heads. Let P be the probabilityof heads, and suppose that P is chosen by “Nature,” according to a uniform distribution on [0,1].We do not observe P, but we get to toss the coin once and see whether it comes up heads. Beforeseeing the outcome of the coin flip, we know that the marginal distribution of P is uniform onthe unit interval. If we in fact observe heads, how should we “update” this distribution to reflectthe new information?The marginal density of P is:fP(p) = 1, 0 ≤ p ≤ 1,and the conditional density of X given P = p isfX |P(x|p) = px· (1 − p)1−x.Therefore we can calculate the joint density:fX ,P(x,p) = fX |P(x|p)· fP(p) = px· (1 − p)1−x.Note that X can only take on values 0 or 1, so its marginal density (actually PMF) isfX(x) =ZpfX ,P(x,p)dp = x ·Z10pd p + (1 − x) ·Z10(1 − p)d p = x ·12+ (1 − x) ·12=12,and the conditional distribution of P given X is: by Bayes’ TheoremfP|X(p|x) =fX |P(x|p) · fP(p)R10fX |P(x|p) · fP(p)d p= 2px(1 − p)1−x.This conditional distribution is what we are after: given the data (X ), we want to know what theconditional distribution of the parameter (P ) looks like. Let’s calculate it for this example. LetX = 1 denote heads.fP|X(p|x = 1) =fX |P(x|p) · fP(p)R10fX |P(x|p) · fP(p)d p=p · 1R10p · d p= 2p.Thus the conditional density of p has a triangular shape, with more mass close to 1 than closeto 0. We call the marginal distribution fPthe “prior” distribution to reflect the interpretation of1P being chosen before X , and we call fP|Xthe “posterior” distribution. We see that the prior ismodified based on the likelihood function fX |Pto obtain the posterior.Now let us look at a more general class of prior distributions. Suppose that the prior for P followsa Beta distribution:fP(p) =Γ(α + β)Γ(α) · Γ(β)pα−1(1 − p)β−1,with α and β known numbers. Notice that the case α = β = 1 gives the uniform distribution, sothat the previous analysis should be a special case. Recall that the mean and variance of the Betadistribution areE[P] =αα + β,andV (P) =αβ(α + β)2(α + β + 1),respectively. Suppose we want the prior distribution to have mean 1/4 and variance 1/100. Thenthere is a Beta distribution corresponding to that, namely the Beta distribution withαα + β=14,andαβ(α + β)2(α + β + 1)=1100,which corresponds to α = 71/16 ≈ 4 and β = 213/16 ≈ 13. (More realistic values might be a meanof 1/2 and a variance or 1/100, but we will work with these numbers in this example.)Again the data consist of just a single observation with X = 1. The joint distribution of P and X ,at X = 1, isfP(p) · fX |P(x|p) =Γ(α + β)Γ(α) · Γ(β)pα−1(1 − p)β−1· p.How do we figure out the conditional distribution of P given X = 1? We want to find the constantsuch that fP(p) · fX |P(x = 1|p) integrates out to one as a function of p. Strip away the part of thefunction that does not depend on p, and we are left withfP|X(p|x) ∝ pα· (1 − p)β−1,where the symbol ∝ means “is proportional to.” This implies that the conditional distribution So a Beta priorleads to a(different) Betaposterior! This isan example of aconjugate prior forthe Bernoullilikelihood.of P given X = 1 is a Beta distribution with parameters α + 1 and β. The mean and variance ofthis distribution areE[P|X = 1] =α + 1α + β + 1,2andV (P|X = 1) =(α + 1) · β(α + β + 1)2(α + β + 2),respectively. After observing X = 1, we update the distribution of P: the mean moves upwards(α+1)/(α+β+1) = (71/16+1)/(71/16+213/16+1) = 0.29, is slightly higher than the unconditionalmean, α/(α+β) = 1/4, and the variance (αβ)/((α+β)2(α+β+1) = 0.0104 is slightly higher than theprior variance of 0.01. (This is somewhat unusual. Typically the posterior variance is lower thanthe prior variance, due to the extra information. Here the fact that the extra information is so farfrom the prior mean implies that the uncertainty is actually increased by the extra information.)Now let us do this more systematically. There are two ingredients to a Bayesian analysis. Firsta model for the data given some unknown parameters. In our example that model was fX |P(x|p) =px· (1 − p)1−x. Second, a prior distribution for the parameters. In our case that is the Beta distri-bution with parameters α and β. This prior distribution is known to the researcher. Then, usingBayes’ theorem we calculate the conditional distribution of the parameters given the data, alsoknown as the posterior distribution,fP|X(p|x) =fX ,P(x, p)fX(x)=fX |P(x|p) · fP(p)RfX |P(x|p) · fP(p)d p.In this step we often use a shortcut. First note that, as a function of p, the conditional density ofP given X can be writtenfP|X(p|x) ∝ fX |P(x|p) · fP(p).Once we calculate this product, we have to find the constant that makes this expression integrateout to one as a function of the parameter. At that stage it is sometimes easy to recognize thedistribution and figure out through that route what the constant is. not always though.Sometimes wehave to calculatethe constantdirectly byintegration. Insome cases,numericalintegration can beused.Example: Let us look at a second example. Suppose the conditional distribution of X given theparameter µ is normal with mean µ and variance 1. The prior distribution for µ is normal withmean zero and variance 100. What is the posterior distribution of µ given X = x? The posteriordistribution is proportional tofµ|X(µ|x) ∝ exp³−12(x − µ)2´· exp³−12 · 100µ2´= exp−12³x2− 2xµ + µ2+ µ2/100´∝ exp³−12(100/101)³µ − (100/101)x´2´.This implies that the conditional distribution of µ given X = x is normal with mean (100/101)xand variance 100/101. .3Point Estimates from Posterior DistributionsThe method of moments and maximum likelihood estimators return a single point estimate fora given data set. In contrast, the Bayesian posterior is an entire distribution over the parameterspace. We can turn this in to a point estimate by taking some measure of central tendency, suchas the conditional mean


View Full Document

UA ECON 520 - Bayesian Point Estimation

Download Bayesian Point Estimation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Point Estimation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Point Estimation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?