Outline 1 Significance testing An example with two quantitative predictors ANOVA f tests Wald t tests Consequences of correlated predictors 2 Model selection Sequential significance testing Nested models Additional Sum of Squares principle Sequential testing the adjusted R 2 Likelihood the Akaike criterion Pesticide example tox read table toxic txt header T tox dose weight toxicity 1 0 696 0 321 0 324 2 0 729 0 354 0 367 3 0 509 0 134 0 321 4 0 559 0 184 0 375 A study was conducted to assess the 5 0 679 0 304 0 345 toxic effect of a pesticide on a given 6 0 583 0 208 0 341 7 0 742 0 367 0 327 species of insect 8 0 781 0 406 0 256 dose dose rate of the pesticide 9 0 865 0 490 0 214 10 0 723 0 223 0 501 weight body weight of an insect 11 0 940 0 440 0 318 tocicity rate of toxic action 12 0 903 0 403 0 317 13 0 910 0 410 0 349 14 0 684 0 184 0 402 15 0 904 0 404 0 374 16 0 887 0 387 0 340 17 0 593 0 093 0 598 18 0 640 0 140 0 444 19 0 512 0 012 0 543 Candidate models Consider 4 possible linear models for this data yi 0 ei yi 0 1 dosei ei yi 0 2 weighti ei yi 0 1 dosei 2 weighti ei Fit these models in R fit 0 fit d fit w fit dw fit wd lm toxicity lm toxicity lm toxicity lm toxicity lm toxicity 1 dose weight dose weight weight dose data tox data tox data tox data tox data tox Comparing models using anova anova fit 0 fit d Analysis of Variance Table Model 1 toxicity 1 Model 2 toxicity dose Res Df RSS Df Sum of Sq F Pr F 1 18 0 1576 2 17 0 1204 1 0 0372 5 26 0 035 anova fit w fit wd Analysis of Variance Table Model 1 toxicity weight Model 2 toxicity weight dose Res Df RSS Df Sum of Sq F Pr F 1 17 0 065499 2 16 0 034738 1 0 030761 14 168 0 001697 Testing 1 0 dose effect gives a different result whether weight is included in the model or not Comparing models using anova We did two different tests H0 1 0 0 is testing 1 0 or not given that only the intercept 0 is in the model H0 1 0 0 2 is testing 1 0 assuming that an intercept 0 and a weight effect 2 are in the model They make different assumptions may reach different results The anova function when given two or more different models does an f test by default Source df SS MS 2 0 1 SS 2 0 SS 2 0 1 1 0 2 1 SS 1 0 2 SS 1 0 2 1 Pn Error n 3 yi y i 2 SSError n 3 Pi 1 n 2 Total n 1 i 1 yi y Fact if H0 is correct F MS 1 0 2 MSError F1 n 3 Comparing models using anova Be very careful with anova on a single model anova fit w fit wd anova fit w fit dw same output anova fit dw Response toxicity Df Sum Sq Mean Sq F value Pr F dose 1 0 037239 0 037239 17 152 0 0007669 weight 1 0 085629 0 085629 39 440 1 097e 05 Residuals 16 0 034738 0 002171 anova fit wd Response toxicity Df Sum Sq Mean Sq F value Pr F weight 1 0 092107 0 092107 42 424 7 147e 06 dose 1 0 030761 0 030761 14 168 0 001697 Residuals 16 0 034738 0 002171 Each predictor is added one by one Type I SS The order matters Which one is appropriate to test a body weight effect to test a dose effect Comparing models using drop1 drop1 fit dw test F Single term deletions Model toxicity dose weight Df Sum of Sq none dose weight 1 1 RSS AIC F value Pr F 0 034738 113 783 0 030761 0 065499 103 733 14 168 0 001697 0 085629 0 120367 92 171 39 440 1 097e 05 drop1 fit wd test F Single term deletions Model toxicity weight dose Df Sum of Sq RSS AIC F value Pr F none 0 034738 113 783 weight 1 0 085629 0 120367 92 171 39 440 1 097e 05 dose 1 0 030761 0 065499 103 733 14 168 0 001697 F tests to test each predictors after accounting for all others Type III SS The order does not matter Comparing models using anova Use anova to compare multiple models Models are nested when one model is a particular case of the other model anova can perform f tests to compare 2 or more nested models anova fit 0 fit d fit dw Model 1 toxicity 1 Model 2 toxicity dose Model 3 toxicity dose weight Res Df RSS Df Sum of Sq F Pr F 1 18 0 157606 2 17 0 120367 1 0 037239 17 152 0 0007669 3 16 0 034738 1 0 085629 39 440 1 097e 05 anova fit 0 fit w fit wd Model 1 toxicity 1 Model 2 toxicity weight Model 3 toxicity weight dose Res Df RSS Df Sum of Sq F Pr F 1 18 0 157606 2 17 0 065499 1 0 092107 42 424 7 147e 06 3 16 0 034738 1 0 030761 14 168 0 001697 Parameter inference using summary The summary function performs Wald t tests summary fit d Coefficients Estimate Std Error t value Pr t Intercept 0 6049 0 1036 5 836 1 98e 05 dose 0 3206 0 1398 2 293 0 0348 Residual standard error 0 08415 on 17 degrees of freedom Multiple R squared 0 2363 Adjusted R squared 0 1914 F statistic 5 259 on 1 and 17 DF p value 0 03485 summary fit wd Coefficients Estimate Std Error t value Pr t Intercept 0 22281 0 08364 2 664 0 01698 weight 1 13321 0 18044 6 280 1 10e 05 dose 0 65139 0 17305 3 764 0 00170 Residual standard error 0 0466 on 16 degrees of freedom Multiple R squared 0 7796 Adjusted R squared 0 752 F statistic 28 3 on 2 and 16 DF p value 5 57e 06 Parameter inference using summary The order does not matter for t tests summary fit wd Coefficients Estimate Std Error t value Pr t Intercept 0 22281 0 08364 2 664 0 01698 weight 1 13321 0 18044 6 280 1 10e 05 dose 0 65139 0 17305 3 764 0 00170 summary fit dw Coefficients Estimate Std Error t value Pr t Intercept 0 22281 0 08364 2 664 0 01698 dose 0 65139 0 17305 3 764 0 00170 weight 1 13321 0 18044 6 280 1 10e 05 Residual standard error 0 0466 on 16 degrees of …
View Full Document