Unformatted text preview:

Economics 520, Fall 2009Lecture Note 12: Bayesian Point Estimation (CB 7.2.3)We have already discussed two general approaches to constructing point estimatorsof parameters, the method of moments and the maximum likelihood method. A thirdimportant class of estimators are the Bayesian estimators, so called because they makeuse of Bayes’ Theorem.Suppose we are interested in the probability that a coin comes up heads. Let P be theprobability of heads, and suppose that P is chosen by “Nature,” according to a uniformdistribution on [0, 1]. We do not observe P, but we get to toss the coin once and seewhether it comes up heads. Before seeing the outcome of the coin flip, we know thatthe marginal distribution of P is uniform on the unit interval. If we in fact observeheads, how should we “update” this distribution to reflect the new information?The marginal density of P is:ƒP( p) = 1, 0 ≤ p ≤ 1,and the conditional density of X given P = p isƒX|P( |p) = p· ( 1 − p)1−.Therefore we can calculate the joint density:ƒX,P( , p) = ƒX|P( |p) · ƒP( p) = p· ( 1 − p)1−.Note that X can only take on values 0 or 1, so its marginal density (actually PMF) isƒX( ) =ZpƒX,P( , p)dp =  ·Z10pdp + (1 − ) ·Z10( 1 − p)dp =  ·12+ ( 1 − ) ·12=12,and the conditional distribution of P given X is: by Bayes’TheoremƒP|X( p|) =ƒX|P( |p) · ƒP( p)R10ƒX|P( |p) · ƒP( p)dp= 2p( 1 − p)1−.This conditional distribution is what we are after: given the data (X), we want to knowwhat the conditional distribution of the parameter (P) looks like. Let’s calculate it forthis example. Let X = 1 denote heads.ƒP|X( p| = 1) =ƒX|P( |p) · ƒP( p)R10ƒX|P( |p) · ƒP( p)dp=p · 1R10p · dp= 2p.Thus the conditional density of p has a triangular shape, with more mass close to 1than close to 0. We call the marginal distribution ƒPthe “prior” distribution to reflect theinterpretation of P being chosen before X, and we call ƒP|Xthe “posterior” distribution.1We see that the prior is modified based on the likelihood function ƒX|Pto obtain theposterior.Now let us look at a more general class of prior distributions. Suppose that the prior forP follows a Beta distribution:ƒP( p) =(α + β)(α) · (β)pα−1( 1 − p)β−1,with α and β known numbers. Notice that the case α = β = 1 gives the uniformdistribution, so that the previous analysis should be a special case. Recall that themean and variance of the Beta distribution areE[P] =αα + β,andV( P) =αβ( α + β)2( α + β + 1),respectively. Suppose we want the prior distribution to have mean 1/4 and variance1/100. Then there is a Beta distribution corresponding to that, namely the Beta distri-bution withαα + β=14,andαβ( α + β)2( α + β + 1)=1100,which corresponds to α = 71/ 16 ≈ 4 and β = 213/16 ≈ 13. (More realistic values mightbe a mean of 1/2 and a variance or 1/100, but we will work with these numbers in thisexample.)Again the data consist of just a single observation with X = 1. The joint distribution ofP and X, at X = 1, isƒP( p) · ƒX|P( |p) =(α + β)(α) · (β)pα−1( 1 − p)β−1· p.How do we figure out the conditional distribution of P given X = 1? We want to find theconstant such that ƒP( p) · ƒX|P(  = 1|p) integrates out to one as a function of p. Stripaway the part of the function that does not depend on p, and we are left with the kernelof the conditional density:ƒP|X( p|) ∝ pα· ( 1 − p)β−1.This implies that the conditional distribution of P given X = 1 is a Beta distribution with So a Beta priorleads to a(different) Betaposterior! This isan example of aconjugate priorfor the Bernoullilikelihood2parameters α + 1 and β. The mean and variance of this distribution areE[P|X = 1] =α + 1α + β + 1,andV( P|X = 1) =( α + 1) · β( α + β + 1)2( α + β + 2),respectively. After observing X = 1, we update the distribution of P: the mean movesupwards (α +1)/(α + β + 1) = (71/16+1)/(71/16 + 213/16 + 1) = 0.29, is slightly higherthan the unconditional mean, α/(α + β) = 1/4, and the variance (αβ)/((α + β)2( α +β + 1) = 0.0104 is slightly higher than the prior variance of 0.01. (This is somewhatunusual. Typically the posterior variance is lower than the prior variance, due to theextra information. Here the fact that the extra information is so far from the prior meanimplies that the uncertainty is actually increased by the extra information.)Now let us do this more systematically. There are two ingredients to a Bayesian anal-ysis. First a model for the data given some unknown parameters. In our example thatmodel was ƒX|P( |p) = p· (1− p)1−. Second, a prior distribution for the parameters. Inour case that is the Beta distribution with parameters α and β. This prior distributionis known to the researcher. Then, using Bayes’ theorem we calculate the conditionaldistribution of the parameters given the data, also known as the posterior distribution,ƒP|X( p|) =ƒX,P( , p)ƒX( )=ƒX|P( |p) · ƒP( p)RƒX|P( |p) · ƒP( p)dp.In this step we often use a shortcut. First note that, as a function of p, the conditionaldensity of P given X is proportional toƒP|X( p|) ∝ ƒX|P( |p) · ƒP( p) .Once we calculate this product, we have to find the constant that makes this expressionintegrate out to one as a function of the parameter. At that stage it is sometimes easyto recognize the distribution and figure out through that route what the constant is. not alwaysthough.Sometimes wehave to calculatethe constantdirectly byintegration. Insome cases,numericalintegration canbe used.Example: Let us look at a second example. Suppose the conditional distribution of Xgiven the parameter μ is normal with mean μ and variance 1. The prior distribution forμ is normal with mean zero and variance 100. What is the posterior distribution of μgiven X = ? The posterior distribution is proportional toƒμ|X( μ|) ∝ exp−12(  − μ)2· exp−12 · 100μ2= exp −122− 2μ + μ2+ μ2/1003∝ exp−12(100/101)μ − ( 100/101)2.This implies that the conditional distribution of μ given X =  is normal with mean( 100/101) and variance 100/101. .Point Estimates from Posterior DistributionsThe method of moments and maximum likelihood estimators return a single point es-timate for a given data set. In contrast, the


View Full Document

UA ECON 520 - Bayesian Point Estimation

Download Bayesian Point Estimation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Point Estimation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Point Estimation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?