122S:138Model ComparisonLecture 18Nov. 12, 2008Kate Cowles374 SH, [email protected] comparison• often there are several plausible candidatemodels– different candidate predictor variables inregression– different link functions in generalized li n-ear models– different assumptions regarding form of like-lihood– different priors• statisticians often will compare the fit of sev-eral models in order to choose the “best” one– then assess whether that one is adequate• alternative: Bayesian model-mixing– does prediction usin g a weighted combina-tion of all can didate models3Model compari son for nested vs. non-nested models• nested models: two regression-type models inwhich the predictors in the smaller model area subset of the predictors in a larger model– larger model will fit b etter but wil l be moredifficult to fit and to interpret– key questions in model comparison∗ is improvement in fit substantial enoughto justify increased difficulty in fittingand interpreting∗ are priors on additional parameters rea-sonable• non-nested models– different link functions in GLMs– non-nested sets o f predi c tors4Tools for Bayesian model comparison• Bayes factors and approximations to them• Deviance Information Criterion5Frequentist use of deviance as measureof model fit in linear and generali zedlinear modelsExample:Dataset is counts of how many beetleswere ki lled ri, i = 1, . . . , 8 in 8 groupsof beetles exposed to different doses of a ninsecticide. Each group i had nibeetlesin it.• consider a “saturated model” for a particulardataset– has a parameter for every observation inthe dataset so i ts fit is “perfect”– not useful, since it is no simpler than theentire original da ta set– but it provides a b enchmark to which tocompare the fit of other models6– saturated model for beetles data would have8 parameters: pi, i = 1, . . . , 8, the pop-ulation proportion killed at each of the 8dose levels– the frequentist point estimate of each piwould b erini• now consider a more useful model that letsus quantify the dose-responselogit(pi) = α + β(xi− ¯x)– has only 2 p a ra meters– will not fit the data as perfectly as thesaturated model• notation: let logL(ˆθ; y) denote the maxi-mum of the log likelihood for a particularmodel• deviance in GLM is defin e d as−2logL(ˆθmodel of interest; y) − logL(ˆθsaturated; y)– this is the likelihood-ratio statistic for test-ing the null hypothesis that the model holds7against the general alterna tive– under certain co nditions, deviance has anasymptotic chi-square d istribution with de-grees of freedo m equal to the d ifferencebetween the number of parameters in thesaturated model and the number of pa-rameters in the model being evaluated8Frequentist deviance for models for bee-tles datafit.beetles(beetles)[,1] [,2][1,] 6 53[2,] 13 47[3,] 18 44[4,] 28 28[5,] 52 11[6,] 52 7[7,] 61 1[8,] 60 0Call:glm(formula = respmat ~ beetles$V1, family = binomial(link = logit))Deviance Residuals:Min 1Q Median 3Q Max-1.5213 -0.6270 0.8705 1.2575 1.6487Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) -59.869 5.100 -11.74 <2e-16 ***beetles$V1 33.784 2.866 11.79 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for binomial family taken to be 1)Null deviance: 280.866 on 7 degrees of freedomResidual deviance: 11.474 on 6 degrees of freedomAIC: 41.8039Call:glm(formula = respmat ~ beetles$V1, family = binomial(link = probit))Deviance Residuals:Min 1Q Median 3Q Max-1.4994 -0.6939 0.7942 1.1473 1.3076Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) -34.501 2.616 -13.19 <2e-16 ***beetles$V1 19.478 1.469 13.26 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for binomial family taken to be 1)Null deviance: 280.866 on 7 degrees of freedomResidual deviance: 10.368 on 6 degrees of freedomAIC: 40.69810Call:glm(formula = respmat ~ beetles$V1, family = binomial(link = cloglog))Deviance Residuals:Min 1Q Median 3Q Max-0.7906 -0.6252 0.0838 0.4158 1.4120Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) -39.035 3.182 -12.27 <2e-16 ***beetles$V1 21.733 1.766 12.31 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for binomial family taken to be 1)Null deviance: 280.8664 on 7 degrees of freedomResidual deviance: 4.0124 on 6 degrees of freedomAIC: 34.342Complementary log-log l ink:cloglog(p) = log(−log(1 − p))11Deviance Information Criterion• Spiegelhal ter D J, Best N G, Carlin B P andvan der Linde A (200 2 ) Bayesian measures ofmodel complexity and fit (with discussion ).J. Roy. Statist. Soc. B. 64, 583-640.• to compare fit and predictive ability of Bayesianmodels• penalty for model co mplexity• also provi des estimate of number of free pa-rameters in the model– highly correlated parameters and param-eters that are strongly influenced by theirpriors count for less than 1 each– called the effective number of parame-ters• built into WinBUGS• can be used to compare non-nested models12• but response variable must have same formin all models– e.g. you couldn’t use it to compare tworegression models, one with y’s untrans-formed and one with y’s log transformed• uses a version of the deviance from which thelog li kelihood of the saturated model is notsubtracted off• let D(y, θ) = −2logp (y|θ)• we want two quantities, which can be approx-imated using MCMC sampler output–ˆDavg(y): D averaged over the posteriordistribution of θ– Dˆθ(y): D evaluated at the posterior meanof θ• then the effective number of parameters isestimated aspD=ˆDavg(y) − Dˆθ(y)13• and the DIC isDIC =ˆDavg(y) + pD= 2ˆDavg(y) − Dˆθ(y)• DIC is an approximation to the expected pre-dictive deviance and has been su g gested asan indica tor of model fit when the goal isto pick a model with the best out-of-samplepredictive ability• smaller values of DIC suggest better
View Full Document