UI STAT 5400 - Review of Bayesian Concepts and Intro to MCMC

Unformatted text preview:

122S:166Computing in StatisticsReview of Bayesian Concepts and Intro toMCMCLecture 14October 15, 2006Kate Cowles374 SH, [email protected] on notation• I will use f(·), p(·), and π(·) to refer to distributionsthat may be either continuous, discrete, or mixed.• I will frequently use an integral. When the argumentmay be a discrete distribution, think summation.• y will refer to observed, or potentially observable,quantities.• θ will refer to unobservable quantities.3Bayesian Basics• Unknown model parameters are random variables.• Our knowledge / uncertainty about unknown modelparameters is appropriately expressed through prob-ability distributions.• Whenever we observe data, these probability distri-butions are updated.4Steps in Bayesian data analysis1. Specify a full probabili ty model• joint probability distribution for all observable andunobservable quantities in a problem.2. Calculate and interpret the posterior distribution• the conditional probability d ist ributio n of the un-observed quantities of interest given the observeddata3. Evaluate model• fit to observed data• consistency of posterior inference with substantiveknowledge• sensitivity to model specification5Components of a Bayesian model• The first stage: the likelihood– the probability distribution for the observeddata y conditional on a vector of unknown param-etersf(y|θ)• The second stage: the priors– probability distribution(s) that express our knowl-edge or uncertainty about model parameters beforethe data are observedπ(θ|η)where η is a vector of “hyperparameters”6Bayes theorem and the posterior di stribution• Bayes theorem is the recipe for using the data to up-date the prior and pro du ce the posterior distributionp(θ|y):p(θ|y, η) =p(y, θ|η)p(y|η)=p(y, θ|η)Rp(y, θ∗|η)dθ∗=f(y|θ)π(θ|η)Rf(y|θ∗)π(θ∗|η)dθ∗• Remarks:– Dependence on η in posterior distributions if of-ten suppressed when values of hyperparameters areknown constants.– In principle, this computation can be done for anyvalid prior and any well-defined likelihood.– The challenge is the integral in the denominat or.7The marginal likeliho od• the marginal distribution of the data y g iven the com-plete modelm(y) =Zf(y|θ∗)π(θ∗|η)dθ∗• also called– prior predictive distribution– normalizing constant• Useful in model comparison employing Bayes factors8Classifications of p rio rs• informative and noninformative• proper and improper– In general, posterior inference is impossible if theposte rior is improper (i.e. if unnormalized poste-rior does not have a finite integral).∗ The use o f proper priors guarantees a properposte rior.∗ If improp er priors are used, it is important toverify analytically that pos ter ior is proper.∗ If prior information is minimal and likelihood iscomplicated, vague but proper priors are advis-able.• conjugate and nonconjugate– a class of prior distributions is said to be conju-gate for a parameter in a likelihood if the resultingposte rior distribu tion is in the same family as theprior– make computation (analytic or by computer) eas-ier, but may not reflect actual prior information– for many likelihood functions, there is no conjugateprior9– examples: what are conjugate priors for∗ normal mean (variance assumed known)∗ normal variance (mean assumed known)10Summarizing Bayesian estimation and infer-ence• All inference is based on the posterior distribution.• More integration is required to obtain marginal pos-terior distributions of parameters of interest (i.e., in-tegrate out nuisance parameters).p(θi|y) =Zp(θ|y)dθ(−i)• Point estimates of parameters– means and medians of posterior marginals– in conjugate one-parameter models∗ the posterior mean is a weighted average of theprior mean and the mean coming from the like-lihood∗ the posterior variance is smaller than either theprior variance or the variance of the parameterin the likelihood function∗ example: normal-normal model11• intervals: credible sets– definition: a 100 × (1 − α)% credible set for aparameter θ is a subset C of the parameter spaceΘ such that1 − α ≤ P (C|y) =Zcp(θ|y)dθ– interpretation: The probability that θ lies in Cgiven the observed data is (at least) (1 − α).– equal-tail credible set: the interval between theα/2 and (1 − α/2) quantiles of p(θ|y)– highest posterior density (HPD) credible set: thesubset C of Θ such thatC = {θ ∈ Θ : p(θ|y) ≥ k(α)}where k(α) is the largest constant satisfyingp(C|y) ≥ 1 − α12The posterior predictive distribu tion• Used to make inferences about a potentially observ-able but unobserved model quanti ty ˜y, conditional ondata y that have already b een observedp(˜y|y) =Zp(˜y, θ|y)dθ=Zp(˜y|θ, y)p(θ|y)dθ=Zp(˜y|θ)p(θ|y)dθ• last equation follows if ˜y and y are conditionally in-dependent given θ in the model13Example of a simple Bayesian model• Normal likelihood, mean and variance both unknown– precision is inverse of varia nceyi|µ, σ2∼ N(µ, σ2), i = 1, . . . , n• Semi conjugate priors on µ and σ2µ ∼ N(µ0, σ20) (1)τ2∼ G(a, b) (2)14Hierarch i cal models• arise if– we don’t know values of hyperparameters in 2ndstage– or we wish to express relationships among param-eters• likelihood is the first “stage” of the model• 2nd stage is priors on the parameters that appear inthe likelihood• subsequent stages are priors on hyperparameters fromthe previous s tages15Example of hierarchical model• A hierarchical model is fit to data on failure rates ofthe pump at each of 10 power plants. The numberof failures for the i-th pu mp is assumed to follow aPoisson distribution:xi∼ Poisson(θiti), i = 1. . . . , 10where θiis the failure rate for pump i and tiis thelength of operation time of the pump (in 1000s ofhours).• A conjugate gamma prior d istr ibutio n is adopted forthe failure rates:θi∼ Gamma(α, β), i = 1. . . . , 10• The following priors are specified for the hyp e rparam -eters α and β:α ∼ Exponential( 1.0 )β ∼ Gamma(0.10, 1.0)16Markov chain Monte Carlo: one method ofBayesian computation• If we can’t analytically do the integration to get theneeded joint and marginal posterior distributions , gen-erate random samples from the joint posterior• MCMC: Do this by constructing a Markov chain withthe joint p os teri or distribution as its stationary


View Full Document

UI STAT 5400 - Review of Bayesian Concepts and Intro to MCMC

Documents in this Course
Load more
Download Review of Bayesian Concepts and Intro to MCMC
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Review of Bayesian Concepts and Intro to MCMC and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Review of Bayesian Concepts and Intro to MCMC 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?