Unformatted text preview:

Basics of Bayesian InferenceBasics of Bayesian InferenceBasics of Bayesian InferenceBasics of Bayesian InferenceBasics of Bayesian InferenceBasics of Bayesian InferenceBasics of Bayesian InferenceBasics of Bayesian InferenceBasics of Bayesian InferenceBasics of Bayesian InferenceBasics of Bayesian InferenceIllustration of Bayes' TheoremIllustration of Bayes' TheoremIllustration of Bayes' TheoremIllustration (continued)Illustration (continued)Illustration (continued)Notes on priorsNotes on priorsNotes on priorsA more general exampleA more general exampleA more general exampleBayesian estimationBayesian estimationBayesian estimationEx: $Y sim Bin(10, heta )$, $ heta sim U(0,1)$, $y_{obs}=7$Ex: $Y sim Bin(10, heta )$,$ heta sim U(0,1)$, $y_{obs}=7$Ex: $Y sim Bin(10, heta )$,$ heta sim U(0,1)$, $y_{obs}=7$Bayesian hypothesis testingBayesian hypothesis testingBayesian hypothesis testingBayesian hypothesis testingBayesian hypothesis testingBayesian hypothesis testingBayesian hypothesis testing (cont'd)Bayesian hypothesis testing (cont'd)Bayesian hypothesis testing (cont'd)Bayesian hypothesis testing via DICBayesian hypothesis testing via DICBayesian hypothesis testing via DICBayesian hypothesis testing via DICBayesian hypothesis testing via DICBayesian hypothesis testing via DICBayesian hypothesis testing via DICBayesian computationBayesian computationGibbs samplingGibbs samplingGibbs samplingGibbs sampling (cont'd)Gibbs sampling (cont'd)Gibbs sampling (cont'd)Gibbs sampling (cont'd)Gibbs sampling (cont'd)Metropolis algorithmMetropolis algorithmMetropolis algorithmMetropolis algorithm (cont'd)Metropolis algorithm (cont'd)Metropolis algorithm (cont'd)Metropolis algorithm (cont'd)Metropolis algorithm (cont'd)Metropolis algorithm (cont'd)Metropolis algorithm (cont'd)Convergence diagnosisConvergence diagnosisVariance estimationVariance estimationVariance estimation (cont'd)Variance estimation (cont'd)Variance estimation (cont'd)Basics of Bayesian InferenceA frequentist thinks of unknown parameters as fixedA Bayesian thinks of parameters as random, and thushaving distributions (just like the data).A Bayesian writes down a prior guess for θ, andcombines it with thelikelihood for the observed data Yto obtain the posterior distribution of θ. All statisticalinferences then follow from summarizing the posterior.This approach expands the class of candidate models,and facilitates hierarchical modeling, where it isimportant to properly account for various sources ofuncertainty (e.g. spatial vs. nonspatial heterogeneity)The classical (frequentist) approach to estimation is not“wrong”, but it is “limited in scope”!Chapter 2: Basics of Point-Referenced Data Models – p. 1/23Basics of Bayesian InferenceAs usual, we start with a model f(y|θ) for the observeddata y = (y1, . . . , yn) given the unknown parametersθ = (θ1, . . . , θK)Add a prior distribution π(θ|λ), where λ is a vector ofhyperparameters.The posterior distribution for θ is given byp(θ|y, λ) =p(y, θ|λ)p(y|λ)=p(y, θ|λ)Rp(y, θ|λ) dθ=f(y|θ)π(θ|λ)Rf(y|θ)π(θ|λ) dθ=f(y|θ)π(θ|λ)m(y|λ).We refer to this formula asBayes’ Theorem.Chapter 2: Basics of Point-Referenced Data Models – p. 2/23Basics of Bayesian InferenceWhen λ is not known, a second stage (hyperprior)distribution h(λ) will often be required, so thatp(θ|y) =p(y, θ)p(y)=Rf(y|θ)π(θ|λ)h(λ) dλRf(y|θ)π(θ|λ)h(λ) dθdλ.Alternatively, we might replace λ in p(θ|y, λ) by anestimateˆλ; this is called empirical Bayes analysisNote thatposterior information ≥ prior information ≥ 0 ,with the second “≥” replaced by “=” only if the prior isnoninformative (which is often uniform, or “flat”).Chapter 2: Basics of Point-Referenced Data Models – p. 3/23Illustration of Bayes’ TheoremSuppose f(y|θ) = N(y|θ, σ2), θ ∈ ℜ and σ > 0 knownIf we take π(θ|λ) = N(θ|µ, τ2) where λ = (µ, τ)′is fixedand known, then it is easy to show thatp(θ|y) = Nµθσ2σ2+ τ2µ +τ2σ2+ τ2y ,σ2τ2σ2+ τ2¶.Note thatThe posterior mean E(θ|y) is a weighted average ofthe prior mean µ and the data value y, with weightsdepending on our relative uncertaintythe posterior precision (reciprocal of the variance) isequal to1/σ2+ 1/τ2, which is the sum of thelikelihood and prior precisions.Chapter 2: Basics of Point-Referenced Data Models – p. 4/23Illustration (continued)As a concrete example, let µ = 2, τ = 1, ¯y = 6, and σ = 1:density-2 0 2 4 6 80.0 0.2 0.4 0.6 0.8 1.0 1.2θpriorposterior with n = 1posterior with n = 10When n = 1, prior and likelihood receive equal weightWhen n = 10, the data dominate the priorThe posterior variance goes to zero as n → ∞Chapter 2: Basics of Point-Referenced Data Models – p. 5/23Notes on priorsThe prior here is conjugate: it leads to a posteriordistribution for θ that is available in closed form, and is amember of the same distributional family as the prior.Note that setting τ2= ∞ corresponds to an arbitrarilyvague (ornoninformative) prior. The posterior is thenp (θ|y) = N¡θ|y, σ2/n¢,the same as the likelihood! The limit of the conjugate(normal) prior here is a uniform (or“flat”) prior, and thusthe posterior is the renormalized likelihood.The flat prior is appealing but improper here, sinceRp(θ)dθ = +∞.However, the posterior is still welldefined, and so improper priors are often used!Chapter 2: Basics of Point-Referenced Data Models – p. 6/23A more general exampleLet Y be an n × 1 data vector, X an n × p matrix ofcovariates, and adopt the likelihood and prior structure,Y|β ∼ Nn(Xβ, Σ) and β ∼ Np(Aα, V )Then the posterior distribution of β|Y isβ|Y ∼ N (Dd, D) , whereD−1= XTΣ−1X + V−1and d = XTΣ−1Y + V−1Aα.V−1= 0 delivers a “flat” prior; if Σ = σ2Ip, we getβ|Y ∼ N³ˆβ , σ2(X′X)−1´,whereˆβ = (X′X)−1X′y ⇐⇒ usual likelihood approach!Chapter 2: Basics of Point-Referenced Data Models – p. 7/23Bayesian estimationPoint estimation: Choose an appropriate measure ofcentrality: the posterior mean, median, or mode.Interval estimation: Consider qLand qU, the α/2- and(1 − α/2)-quantiles of p(θ|y):ZqL−∞p(θ|y)dθ = α/2 andZ∞qUp(θ|y)dθ = 1 − α/2 .Then clearlyP (qL< θ < qU|y) = 1 − α; our confidencethat θ lies in (qL, qU) is 100 × (1 − α)%. Thus this intervalis a 100 × (1 − α)% credible set (“Bayesian CI”) for θ.This interval is relatively easy to compute, and enjoys adirect interpretation


View Full Document

U of M PUBH 8472 - Basics of Bayesian Inference

Download Basics of Bayesian Inference
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Basics of Bayesian Inference and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Basics of Bayesian Inference 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?