UI STAT 4520 - Posterior predictive checking

Unformatted text preview:

122S:138Posterior predictive checkingLecture 20Nov. 9, 2009Kate Cowles374 SH, [email protected] checking and sensitivity analy-sis• goal: assess fit of model to– data– our substantive knowledg• must check effects of– prior– likelihood specification– hierarchical structure– any other application-specific issues∗ e.g. which predictor variables3• theoretically possible to set up and fit a “su-per model” including all possibly true models– but computationally infeasible– and really conceptually impossible• instead we fit a feasi ble number of modelsand examine the posterior distributions thatresult– cast models as broadly as possible– fail to fit reality?– sensitive to arbitrary specifications?4Principles and methods of model-checking• “do the model’s defici e ncies have a noticeableeffect on substantive inferences?”• how to judge when assumptions of conve-nience can be made safely5Using the posterior distribution to checka statistical model• compare posterior distribution of parametersto– substantive knowledge– other data• compare posterior predictive distribution offuture observations to substantive knowledge– e.g.: compare election predictions from amodel to substantive knowledge• compare posterior predictive distribtuion offuture observations to the data that have ac-tually occurred6Using the posterior predictive distribu-tion to check a statistical model• recall:– posterior: conditional on observed data y– predictive: prediction of an observable bu tunobserved y–p(˜y|y) =Zp(˜y, θ|y)dθZp(˜y|θ, y)p(θ|y)dθZp(˜y|θ)p(θ|y)dθ– last line hold s if new data are condition-ally independent of old data given modelparameters7Checking a model by comparing thedata that we have to the posterior pre-dictive distribution• enables checking fit of model without anymore substantive knowledge than is in ex-isting data and model• do datasets simulated from the model we fit“look like” th e real data in ways relevant toour inference?• requires drawing “replicated data”8Procedure to draw a “replicated dataset”from posterior predictive distribution• notation– y: observed data– yrep: a complete simulated dataset∗ same number of observations as in y∗ same values of explanatory variables (ifany)∗ response variables simulated from pos-terior predictive distribution– θ: vector of all unknown model parame-ters, including pa rameters of upper stagepriors if model is hierarchical9• Step 1: draw θ∗from p(θ|y)i.e. from posterior distribution of θ• Step 2: draw yrepfrom p(yrep|θ∗• repeat steps 1 and 2 a large number of times10Discrepancy measures or test quanti-ties for posterior predictive checks• intended to measure discrepancy between modeland real data• T (y, θ): scalar summary of data (and pos-sibly parameters) used as a stand a rd whencomparing real data to data simulated fromposterior predictive distribution• choose one or more test quantities that aremeaningful with respect to your research pur-pose11Using the test quantities: posterior pre-dictive p-values• compute T (y, θ) for the real data y• compute T (y, θrep) for each simulated repli-cate dataset• compute the proporti o n of the repli c a ted datasetsfor which T (yrep, θ) ≥ T (y, θ)• this is an approximation to the Bayes p-valueZ ZI(T (yrep,θ)≥T (y,θ))p(θ|y)p(yrep|θ) dθ dyrep• that is, Bayes p-value is P r(T (yrep, θ) ≥T (y, θ)) with the probability taken over thejoint posterior distribution of θ and yrep12Evaluating outliers in Newcomb’s speedof light data• from GCSR textbook• 66 measurements of speed of li g ht; two lowoutliers• what we want to evaluate: is normal densityok for likelihood?• defined T (y, θ) as min(yi)– to check whether data wi th such extremeoutliers could reasonably have come froma normal model13• Fit model to the 66 observationsyi∼ N (µ, σ2), i = 1, . . . , 66p(µ, σ2) ∝1σ2• generated 20 replicate datasets• found that in all replicate d atasets, min(yrepi)was much larger than min(yi) in real data14Interpreting and using posterior pre-dictive p-values• not Pr(model is true | data)• posterior probability that T (yrep, θ) ≥ T (y, θ)x)• ideal is if posterior predictive p-value is some-where around .5– would mean that real data y is typical ofdata that comes from the model• model is suspect if tail-area probabil ity ofmeaningful test quantity is close to either 0or 1– would mean that aspect of data bein g mea-sured by test quantity is i nconsistent withmodel– extreme ppp-value indicates that modelneeds to be changed or expanded∗ in Newcomb example, use t or contam-inated normal


View Full Document

UI STAT 4520 - Posterior predictive checking

Documents in this Course
Load more
Download Posterior predictive checking
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Posterior predictive checking and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Posterior predictive checking 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?