UA ECON 520 - Bayesian Point Estimation - D2605010

Home> Schools> University of Arizona> Economics (ECON) > ECON 520> Bayesian Point Estimation

UA ECON 520 - Bayesian Point Estimation

Course Econ 520- Theory of Quantitative Methods in Economics

Pages 5

Download Save

Unformatted text preview:

Economics 520, Fall 2009Lecture Note 12: Bayesian Point Estimation (CB 7.2.3)We have already discussed two general approaches to constructing point estimatorsof parameters, the method of moments and the maximum likelihood method. A thirdimportant class of estimators are the Bayesian estimators, so called because they makeuse of Bayes’ Theorem.Suppose we are interested in the probability that a coin comes up heads. Let P be theprobability of heads, and suppose that P is chosen by “Nature,” according to a uniformdistribution on [0, 1]. We do not observe P, but we get to toss the coin once and seewhether it comes up heads. Before seeing the outcome of the coin flip, we know thatthe marginal distribution of P is uniform on the unit interval. If we in fact observeheads, how should we “update” this distribution to reflect the new information?The marginal density of P is:ƒP( p) = 1, 0 ≤ p ≤ 1,and the conditional density of X given P = p isƒX|P( |p) = p· ( 1 − p)1−.Therefore we can calculate the joint density:ƒX,P( , p) = ƒX|P( |p) · ƒP( p) = p· ( 1 − p)1−.Note that X can only take on values 0 or 1, so its marginal density (actually PMF) isƒX( ) =ZpƒX,P( , p)dp =  ·Z10pdp + (1 − ) ·Z10( 1 − p)dp =  ·12+ ( 1 − ) ·12=12,and the conditional distribution of P given X is: by Bayes’TheoremƒP|X( p|) =ƒX|P( |p) · ƒP( p)R10ƒX|P( |p) · ƒP( p)dp= 2p( 1 − p)1−.This conditional distribution is what we are after: given the data (X), we want to knowwhat the conditional distribution of the parameter (P) looks like. Let’s calculate it forthis example. Let X = 1 denote heads.ƒP|X( p| = 1) =ƒX|P( |p) · ƒP( p)R10ƒX|P( |p) · ƒP( p)dp=p · 1R10p · dp= 2p.Thus the conditional density of p has a triangular shape, with more mass close to 1than close to 0. We call the marginal distribution ƒPthe “prior” distribution to reflect theinterpretation of P being chosen before X, and we call ƒP|Xthe “posterior” distribution.1We see that the prior is modified based on the likelihood function ƒX|Pto obtain theposterior.Now let us look at a more general class of prior distributions. Suppose that the prior forP follows a Beta distribution:ƒP( p) =(α + β)(α) · (β)pα−1( 1 − p)β−1,with α and β known numbers. Notice that the case α = β = 1 gives the uniformdistribution, so that the previous analysis should be a special case. Recall that themean and variance of the Beta distribution areE[P] =αα + β,andV( P) =αβ( α + β)2( α + β + 1),respectively. Suppose we want the prior distribution to have mean 1/4 and variance1/100. Then there is a Beta distribution corresponding to that, namely the Beta distri-bution withαα + β=14,andαβ( α + β)2( α + β + 1)=1100,which corresponds to α = 71/ 16 ≈ 4 and β = 213/16 ≈ 13. (More realistic values mightbe a mean of 1/2 and a variance or 1/100, but we will work with these numbers in thisexample.)Again the data consist of just a single observation with X = 1. The joint distribution ofP and X, at X = 1, isƒP( p) · ƒX|P( |p) =(α + β)(α) · (β)pα−1( 1 − p)β−1· p.How do we figure out the conditional distribution of P given X = 1? We want to find theconstant such that ƒP( p) · ƒX|P(  = 1|p) integrates out to one as a function of p. Stripaway the part of the function that does not depend on p, and we are left with the kernelof the conditional density:ƒP|X( p|) ∝ pα· ( 1 − p)β−1.This implies that the conditional distribution of P given X = 1 is a Beta distribution with So a Beta priorleads to a(different) Betaposterior! This isan example of aconjugate priorfor the Bernoullilikelihood2parameters α + 1 and β. The mean and variance of this distribution areE[P|X = 1] =α + 1α + β + 1,andV( P|X = 1) =( α + 1) · β( α + β + 1)2( α + β + 2),respectively. After observing X = 1, we update the distribution of P: the mean movesupwards (α +1)/(α + β + 1) = (71/16+1)/(71/16 + 213/16 + 1) = 0.29, is slightly higherthan the unconditional mean, α/(α + β) = 1/4, and the variance (αβ)/((α + β)2( α +β + 1) = 0.0104 is slightly higher than the prior variance of 0.01. (This is somewhatunusual. Typically the posterior variance is lower than the prior variance, due to theextra information. Here the fact that the extra information is so far from the prior meanimplies that the uncertainty is actually increased by the extra information.)Now let us do this more systematically. There are two ingredients to a Bayesian anal-ysis. First a model for the data given some unknown parameters. In our example thatmodel was ƒX|P( |p) = p· (1− p)1−. Second, a prior distribution for the parameters. Inour case that is the Beta distribution with parameters α and β. This prior distributionis known to the researcher. Then, using Bayes’ theorem we calculate the conditionaldistribution of the parameters given the data, also known as the posterior distribution,ƒP|X( p|) =ƒX,P( , p)ƒX( )=ƒX|P( |p) · ƒP( p)RƒX|P( |p) · ƒP( p)dp.In this step we often use a shortcut. First note that, as a function of p, the conditionaldensity of P given X is proportional toƒP|X( p|) ∝ ƒX|P( |p) · ƒP( p) .Once we calculate this product, we have to find the constant that makes this expressionintegrate out to one as a function of the parameter. At that stage it is sometimes easyto recognize the distribution and figure out through that route what the constant is. not alwaysthough.Sometimes wehave to calculatethe constantdirectly byintegration. Insome cases,numericalintegration canbe used.Example: Let us look at a second example. Suppose the conditional distribution of Xgiven the parameter μ is normal with mean μ and variance 1. The prior distribution forμ is normal with mean zero and variance 100. What is the posterior distribution of μgiven X = ? The posterior distribution is proportional toƒμ|X( μ|) ∝ exp−12(  − μ)2· exp−12 · 100μ2= exp −122− 2μ + μ2+ μ2/1003∝ exp−12(100/101)μ − ( 100/101)2.This implies that the conditional distribution of μ given X =  is normal with mean( 100/101) and variance 100/101. .Point Estimates from Posterior DistributionsThe method of moments and maximum likelihood estimators return a single point es-timate for a given data set. In contrast, the

View Full Document


School:
Email:
New Password:
Confirm Password:

UA ECON 520 - Bayesian Point Estimation

Sign up for free to view:

Please select your school