122S:166Computing in StatisticsReview of Bayesian Concepts and Intro toMCMCLecture 14October 15, 2006Kate Cowles374 SH, [email protected] on notation• I will use f(·), p(·), and π(·) to refer to distributionsthat may be either continuous, discrete, or mixed.• I will frequently use an integral. When the argumentmay be a discrete distribution, think summation.• y will refer to observed, or potentially observable,quantities.• θ will refer to unobservable quantities.3Bayesian Basics• Unknown model parameters are random variables.• Our knowledge / uncertainty about unknown modelparameters is appropriately expressed through prob-ability distributions.• Whenever we observe data, these probability distri-butions are updated.4Steps in Bayesian data analysis1. Specify a full probabili ty model• joint probability distribution for all observable andunobservable quantities in a problem.2. Calculate and interpret the posterior distribution• the conditional probability d ist ributio n of the un-observed quantities of interest given the observeddata3. Evaluate model• fit to observed data• consistency of posterior inference with substantiveknowledge• sensitivity to model specification5Components of a Bayesian model• The first stage: the likelihood– the probability distribution for the observeddata y conditional on a vector of unknown param-etersf(y|θ)• The second stage: the priors– probability distribution(s) that express our knowl-edge or uncertainty about model parameters beforethe data are observedπ(θ|η)where η is a vector of “hyperparameters”6Bayes theorem and the posterior di stribution• Bayes theorem is the recipe for using the data to up-date the prior and pro du ce the posterior distributionp(θ|y):p(θ|y, η) =p(y, θ|η)p(y|η)=p(y, θ|η)Rp(y, θ∗|η)dθ∗=f(y|θ)π(θ|η)Rf(y|θ∗)π(θ∗|η)dθ∗• Remarks:– Dependence on η in posterior distributions if of-ten suppressed when values of hyperparameters areknown constants.– In principle, this computation can be done for anyvalid prior and any well-defined likelihood.– The challenge is the integral in the denominat or.7The marginal likeliho od• the marginal distribution of the data y g iven the com-plete modelm(y) =Zf(y|θ∗)π(θ∗|η)dθ∗• also called– prior predictive distribution– normalizing constant• Useful in model comparison employing Bayes factors8Classifications of p rio rs• informative and noninformative• proper and improper– In general, posterior inference is impossible if theposte rior is improper (i.e. if unnormalized poste-rior does not have a finite integral).∗ The use o f proper priors guarantees a properposte rior.∗ If improp er priors are used, it is important toverify analytically that pos ter ior is proper.∗ If prior information is minimal and likelihood iscomplicated, vague but proper priors are advis-able.• conjugate and nonconjugate– a class of prior distributions is said to be conju-gate for a parameter in a likelihood if the resultingposte rior distribu tion is in the same family as theprior– make computation (analytic or by computer) eas-ier, but may not reflect actual prior information– for many likelihood functions, there is no conjugateprior9– examples: what are conjugate priors for∗ normal mean (variance assumed known)∗ normal variance (mean assumed known)10Summarizing Bayesian estimation and infer-ence• All inference is based on the posterior distribution.• More integration is required to obtain marginal pos-terior distributions of parameters of interest (i.e., in-tegrate out nuisance parameters).p(θi|y) =Zp(θ|y)dθ(−i)• Point estimates of parameters– means and medians of posterior marginals– in conjugate one-parameter models∗ the posterior mean is a weighted average of theprior mean and the mean coming from the like-lihood∗ the posterior variance is smaller than either theprior variance or the variance of the parameterin the likelihood function∗ example: normal-normal model11• intervals: credible sets– definition: a 100 × (1 − α)% credible set for aparameter θ is a subset C of the parameter spaceΘ such that1 − α ≤ P (C|y) =Zcp(θ|y)dθ– interpretation: The probability that θ lies in Cgiven the observed data is (at least) (1 − α).– equal-tail credible set: the interval between theα/2 and (1 − α/2) quantiles of p(θ|y)– highest posterior density (HPD) credible set: thesubset C of Θ such thatC = {θ ∈ Θ : p(θ|y) ≥ k(α)}where k(α) is the largest constant satisfyingp(C|y) ≥ 1 − α12The posterior predictive distribu tion• Used to make inferences about a potentially observ-able but unobserved model quanti ty ˜y, conditional ondata y that have already b een observedp(˜y|y) =Zp(˜y, θ|y)dθ=Zp(˜y|θ, y)p(θ|y)dθ=Zp(˜y|θ)p(θ|y)dθ• last equation follows if ˜y and y are conditionally in-dependent given θ in the model13Example of a simple Bayesian model• Normal likelihood, mean and variance both unknown– precision is inverse of varia nceyi|µ, σ2∼ N(µ, σ2), i = 1, . . . , n• Semi conjugate priors on µ and σ2µ ∼ N(µ0, σ20) (1)τ2∼ G(a, b) (2)14Hierarch i cal models• arise if– we don’t know values of hyperparameters in 2ndstage– or we wish to express relationships among param-eters• likelihood is the first “stage” of the model• 2nd stage is priors on the parameters that appear inthe likelihood• subsequent stages are priors on hyperparameters fromthe previous s tages15Example of hierarchical model• A hierarchical model is fit to data on failure rates ofthe pump at each of 10 power plants. The numberof failures for the i-th pu mp is assumed to follow aPoisson distribution:xi∼ Poisson(θiti), i = 1. . . . , 10where θiis the failure rate for pump i and tiis thelength of operation time of the pump (in 1000s ofhours).• A conjugate gamma prior d istr ibutio n is adopted forthe failure rates:θi∼ Gamma(α, β), i = 1. . . . , 10• The following priors are specified for the hyp e rparam -eters α and β:α ∼ Exponential( 1.0 )β ∼ Gamma(0.10, 1.0)16Markov chain Monte Carlo: one method ofBayesian computation• If we can’t analytically do the integration to get theneeded joint and marginal posterior distributions , gen-erate random samples from the joint posterior• MCMC: Do this by constructing a Markov chain withthe joint p os teri or distribution as its stationary
View Full Document