UI STAT 4520 - Bayesian Statistics - D2112321

Home> Schools> University of Iowa> Statistics (STAT) > STAT 4520> Bayesian Statistics

UI STAT 4520 - Bayesian Statistics

Pages 5

Download Save

Unformatted text preview:

Markov Chain Monte Carlo22S:138, Bayesian StatisticsLecture 10Sept. 25, 2006Kate Cowles374 SH, 335-0727The Poisson distribution (one moreone-parameter distribution)• The Poisson distribution may be appropriatewhen the data are counts of rare events.• events occurring at random at a constant rateper unit time, distance, volume, or whatever• assumption that the number of events thatoccur in any interval is independent of thenumber of events occurring in a disjointinterval• examples:– the number of cases of a rare form of canceroccurring in Johnson County in eachcalendar year– the number of flaws occurring in each100-foot length of yarn produced by aspinning machine– the number of particles of pollen per cubicfoot of air in this room• Since the values of a random variablefollowing a Poisson distribution are counts,what are the possible values?• probability mass function for a Poissonrandom variablep(y|λ) =e−λλyy!, y = 0, 1, . . .• the count of the number of events occurring inm time units also follows a Poissondistribution, but with parameter mλ• The conjugate prior distribution for thePoisson rate parameter is the gamma family.Markov Chain Monte Carlo Methods• Goals– to make inference about model parameters– to make predictions• Requires– integration over possibly high-dimensionalintegrand– and we may not know the integratingconstantMonte Carlo integration and MCMC• Monte Carlo integration– draw independent samples from requireddistribution– use sample averages to approximateexpectations• Markov chain Monte Carlo (MCMC)– draws samples by running a Markov chainthat is constructed so that its limitingdistribution is the joint distribution ofinterestMarkov chains• A Markov chain is a sequence of randomvariables X0, X1, X2, . . .• At each time t ≥ 0 the next state Xt+1issampled from a distributionP (Xt+1|Xt)that depends only on the state at time t– called “transition kernel”• Under certain regularity conditions, theiterates from a Markov chain will graduallyconverge to draws from a unique stationaryor invariant distribution– i.e. chain will “forget’ its initial state– as t increases, sampled points Xtwill lookincreasingly like (correlated) samples fromthe stationary distribution• Suppose:– MC is run for N (large number) iterations– we throw away output from first miterations– regularity conditions are met• then by ergodic theorem– we can use averages of remaining samplesto estimate meansE[f (X)] '1N − mNXt=m+1f(Xt)Gibbs sampling: one way to constructthe transition kernel• seminal references– Geman and Geman (IEEE Trans. Pattn.Anal. Mach. Intel., 1984)– Gelfand and Smith (JASA, 1990)– Hastings (Biometrika, 1970)– Metropolis, Rosenbluth, et al. (J. Chem.Phys, 1953)• subject to regularity conditions, jointdistribution is uniquely determined by “fullconditional distributions”– full conditional distribution for a modelquantity is distribution of that quantityconditional on assumed known values of allthe other quantities in the model• break complicated, high-dimensional probleminto a large number of simpler,low-dimensional problemsExample: Inference about normal meanand variance, both unknown• modelyi|µ, σ2∼ N(µ, σ2)i = 1, . . . , N• priorsµ ∼ N(µ0, σ20)σ2∼ IG(a1, b1)• We want posterior means, posterior medians,posterior credible sets for µ, σ2Full Conditional Distributions forNormal Model• to extract mathematical form of fullconditional for a parameter:– write out expression to which jointposterior is proportional– pull out all terms containing the parameterof interestGibbs Sampler algorithm for Normal1. choose initial values µ(0), σ2(0)2. at each iteration t, generate new value foreach parameter, conditional on most recentvalue of all other parametersWhat are BUGS and WinBUGS?• “Bayesian inference Using Gibbs Sampling”• general purpose program for fitting Bayesianmodels• developed by David Spiegelhalter, AndrewThomas, Nicky Best, and Wally Gilks at theMedical Research Council Biostatistics Unit,Institute of Public He alth, in Cambridge, UK• BUGS– for Unix and DOS platforms– written in Modula-2; distributed incompiled form only• WinBUGS– for Windows– written in Component Pascal running inOberon Microsystems’ Blackboxenvironment– able to fit a wider variety of models thanBUGS can handle– undergoing continuing development• excellent documentation, including twovolumes of examples• Web page:http://www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml• OpenBUGS– open source version of WinBUGS– interfaces easily with R– Web page:http://mathstat.helsinki.fi/openbugs/What do BUGS and WinBUGS do?• enable user to specify model in simpleSplus-like language• construct the transition kernel for a Markovchain with the joint posterior as its stationarydistribution, and simulate a sample path ofthe resulting chain– determine whether or not the fullconditional for each unknown quantity(parameter or missing data) in the model isa standard density.– generate random variates from standarddensities using standard algorithms.– BUGS uses the adaptive rejectionalgorithm (Gilks and Wild, AppliedStatistics, 1992) to generate fromnonstandard full conditionals;consequently can handle only log-concaveor discrete full conditionals– WinBUGS uses Metropolis algorithm togenerate from nonstandard full conditionalsand is not subject to this limitationThe Art and Science of MCMC Use• Deciding how many chains to run• Choosing initial values– Do not confuse initial values with priors!– Priors are part of the model. Initial valuesare part of the computing strategy used tofit the model.– Priors must not be based on the currentdata.– The best choices of initial values are valuesthat are in a high-posterior-density regionof the parameter space. If the prior is notvery strong, then maximum likelihoodestimates (from the current data) areexcellent choices of initial values if they canbe calculated.– In the simple models we have encounteredso far, the MCMC sampler will convergequickly even with a poor choice of initialvalues.– In more complicated models, choosinginitial values in low posterior densityregions may make the sampler take a hugenumber of iterations to finally startdrawing from a good approximation to thetrue posterior.• Assessing whether sampler has “converged”– How many initial iterations need to bediscarded in order that

View Full Document


School:
Email:
New Password:
Confirm Password:

UI STAT 4520 - Bayesian Statistics

Sign up for free to view:

Please select your school