Lecture 15: Logistic Regression: Inference and link functionsMore on our exampleOther covariates: Simple logistic modelsWhat is a good multiple regression model?Multiple regression model proposalCategorical pairsCategorical vs. continuousSlide 8Slide 9Lots of “correlation” between covariatesFull model resultsWhat next?Likelihood Ratio TestSlide 14Recall the likelihood functionEstimating the log-likelihoodSlide 17LRT in RSlide 19Testing DPROSMore in RNotes on LRTFor next time, read the following articleLecture 15:Logistic Regression: Inference and link functionsBMTRY 701Biostatistical Methods IIMore on our example> pros5.reg <- glm(cap.inv ~ log(psa) + gleason, family=binomial)> summary(pros5.reg)Call:glm(formula = cap.inv ~ log(psa) + gleason, family = binomial)Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.1061 0.9916 -8.174 2.97e-16 ***log(psa) 0.4812 0.1448 3.323 0.000892 ***gleason 1.0229 0.1595 6.412 1.43e-10 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 512.29 on 379 degrees of freedomResidual deviance: 403.90 on 377 degrees of freedomAIC: 409.9Other covariates: Simple logistic modelsCovariate Beta exp(Beta) ZAge -0.0082 0.99 -0.51Race -0.054 0.95 -0.15Vol -0.014 0.99 -2.26Dig Exam (vs. no nodule)Unilobar left 0.88 2.41 2.81Unilobar right 1.56 4.76 4.78Bilobar 2.10 8.17 5.44Detection in RE 1.71 5.53 4.48LogPSA 0.87 2.39 6.62Gleason 1.24 3.46 8.12What is a good multiple regression model?Principles of model building are analogous to linear regressionWe use the same approach•Look for significant covariates in simple models•consider multicollinearity•look for confounding (i.e. change in betas when a covariate is removed)Multiple regression model proposalGleason, logPSA, Volume, Digital Exam result, detection in REBut, what about collinearity? 5 choose 2 pairs. gleason log.psa. volgleason 1.00 0.46 -0.06log.psa. 0.46 1.00 0.05vol -0.06 0.05 1.00gleason-1 0 1 2 3 4 50 2 4 6 8-1 0 1 2 3 4 5log.psa.0 2 4 6 8 0 20 40 60 80 1000 20 40 60 80 100volCategorical pairs> dpros.dcaps <- epitab(dpros, dcaps)> dpros.dcaps$tab OutcomePredictor 1 p0 2 p1 oddsratio lower upper 1 95 0.2802360 4 0.09756098 1.000000 NA NA 2 123 0.3628319 9 0.21951220 1.737805 0.5193327 5.815089 3 84 0.2477876 12 0.29268293 3.392857 1.0540422 10.921270 4 37 0.1091445 16 0.39024390 10.270270 3.2208157 32.748987 OutcomePredictor p.value 1 NA 2 4.050642e-01 3 3.777900e-02 4 1.271225e-05> fisher.test(table(dpros, dcaps)) Fisher's Exact Test for Count Datadata: table(dpros, dcaps) p-value = 2.520e-05alternative hypothesis: two.sidedCategorical vs. continuoust-tests and anova: means by category> summary(lm(log(psa)~dcaps))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.2506 0.1877 6.662 9.55e-11 ***dcaps 0.8647 0.1632 5.300 1.97e-07 ***---Residual standard error: 0.9868 on 378 degrees of freedomMultiple R-squared: 0.06917, Adjusted R-squared: 0.06671 F-statistic: 28.09 on 1 and 378 DF, p-value: 1.974e-07 > summary(lm(log(psa)~factor(dpros)))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.1418087 0.0992064 21.589 < 2e-16 ***factor(dpros)2 -0.1060634 0.1312377 -0.808 0.419 factor(dpros)3 0.0001465 0.1413909 0.001 0.999 factor(dpros)4 0.7431101 0.1680055 4.423 1.28e-05 ***---Residual standard error: 0.9871 on 376 degrees of freedomMultiple R-squared: 0.07348, Adjusted R-squared: 0.06609 F-statistic: 9.94 on 3 and 376 DF, p-value: 2.547e-06Categorical vs. continuous> summary(lm(vol~dcaps))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 22.905 3.477 6.587 1.51e-10 ***dcaps -6.362 3.022 -2.106 0.0359 * ---Residual standard error: 18.27 on 377 degrees of freedom (1 observation deleted due to missingness)Multiple R-squared: 0.01162, Adjusted R-squared: 0.009003 F-statistic: 4.434 on 1 and 377 DF, p-value: 0.03589 > summary(lm(vol~factor(dpros)))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 17.417 1.858 9.374 <2e-16 ***factor(dpros)2 -1.638 2.453 -0.668 0.505 factor(dpros)3 -1.976 2.641 -0.748 0.455 factor(dpros)4 -3.513 3.136 -1.120 0.263 ---Residual standard error: 18.39 on 375 degrees of freedom (1 observation deleted due to missingness)Multiple R-squared: 0.003598, Adjusted R-squared: -0.004373 F-statistic: 0.4514 on 3 and 375 DF, p-value: 0.7164Categorical vs. continuous> summary(lm(gleason~dcaps))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.2560 0.1991 26.401 < 2e-16 ***dcaps 1.0183 0.1730 5.885 8.78e-09 ***---Residual standard error: 1.047 on 378 degrees of freedomMultiple R-squared: 0.08394, Adjusted R-squared: 0.08151 F-statistic: 34.63 on 1 and 378 DF, p-value: 8.776e-09 > summary(lm(gleason~factor(dpros)))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.9798 0.1060 56.402 < 2e-16 ***factor(dpros)2 0.4217 0.1403 3.007 0.00282 ** factor(dpros)3 0.4890 0.1511 3.236 0.00132 ** factor(dpros)4 0.9636 0.1795 5.367 1.40e-07 ***---Residual standard error: 1.055 on 376 degrees of freedomMultiple R-squared: 0.07411, Adjusted R-squared: 0.06672 F-statistic: 10.03 on 3 and 376 DF, p-value: 2.251e-06Lots of “correlation” between covariatesWe should expect that there will be insignificance and confounding.Still, try the ‘full model’ and see what happensFull model results> mreg <- glm(cap.inv ~ gleason + log(psa) + vol + dcaps + factor(dpros), family=binomial)> > summary(mreg)Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.617036 1.102909 -7.813 5.58e-15 ***gleason 0.908424 0.166317 5.462 4.71e-08 ***log(psa) 0.514200 0.156739 3.281 0.00104 ** vol -0.014171 0.007712 -1.838 0.06612 . dcaps 0.464952 0.456868 1.018 0.30882 factor(dpros)2 0.753759 0.355762 2.119 0.03411 * factor(dpros)3 1.517838 0.372366 4.076 4.58e-05
View Full Document