Lecture 8: ANOVA tables F-testsANOVATotal Deviations:Regression Deviations:Error Deviations:DefinitionsExample: logLOS ~ BEDSDegrees of FreedomMean SquaresStandard ANOVA TableANOVA for logLOS ~ BEDSInference?ImplicationsF-testSlide 15Implementing the F-testSlide 17More interesting: MLRgeneral F testing approachExample of ‘nested’ modelsTesting: Models must be nested!RSlide 23Slide 24Testing more than two covariatesSlide 26Slide 27Testing multiple coefficients simultaneouslyLecture 8:ANOVA tablesF-testsBMTRY 701Biostatistical Methods IIANOVAAnalysis of VarianceSimilar in derivation to ANOVA that is generalization of two-sample t-testPartitioning of variance into several parts•that due to the ‘model’: SSR•that due to ‘error’: SSEThe sum of the two parts is the total sum of squares: SSTTotal Deviations:0 200 400 600 8002.0 2.2 2.4 2.6 2.8 3.0data$BEDSdata$logLOSYYiRegression Deviations:0 200 400 600 8002.0 2.2 2.4 2.6 2.8 3.0data$BEDSdata$logLOSYYiˆError Deviations:0 200 400 600 8002.0 2.2 2.4 2.6 2.8 3.0data$BEDSdata$logLOSiiYYˆDefinitionsSSESSRSSTYYSSEYYSSRYYSSTiii222)ˆ()ˆ()(iiiiYYYYYYˆˆExample: logLOS ~ BEDS> ybar <- mean(data$logLOS)> yhati <- reg$fitted.values> sst <- sum((data$logLOS- ybar)^2)> ssr <- sum((yhati - ybar )^2)> sse <- sum((data$logLOS - yhati)^2)> > sst[1] 3.547454> ssr[1] 0.6401715> sse[1] 2.907282> sse+ssr[1] 3.547454>Degrees of Freedom Degrees of freedom for SST: n - 1•one df is lost because it is used to estimate mean YDegrees of freedom for SSR: 1•only one df because all estimates are based on same fitted regression lineDegrees of freedom for SSE: n - 2•two lost due to estimating regression line (slope and intercept)Mean Squares“Scaled” version of Sum of SquaresMean Square = SS/dfMSR = SSR/1MSE = SSE/(n-2)Notes: • mean squares are not additive! That is, MSR + MSE ≠SST/(n-1)•MSE is the same as we saw previouslyStandard ANOVA TableSS df MSRegressionSSR 1 MSRErrorSSE n-2 MSETotalSST n-1ANOVA for logLOS ~ BEDS> anova(reg)Analysis of Variance TableResponse: logLOS Df Sum Sq Mean Sq F value Pr(>F) BEDS 1 0.64017 0.64017 24.442 2.737e-06 ***Residuals 111 2.90728 0.02619Inference?What is of interest and how do we interpret?We’d like to know if BEDS is related to logLOS.How do we do that using ANOVA table?We need to know the expected value of the MSR and MSE:22122)()()(XXMSREMSEEiImplicationsmean of sampling distribution of MSE is σ2 regardless of whether or not β1= 0If β1= 0, E(MSE) = E(MSR)If β1≠ 0, E(MSE) < E(MSR)To test significance of β1, we can test if MSR and MSE are of the same magnitude.22122)()()(XXMSREMSEEiF-testDerived naturally from the arguments just madeHypotheses:•H0: β1= 0•H1: β1≠ 0Test statistic: F* = MSR/MSEBased on earlier argument we expect F* >1 if H1 is true.Implies one-sided test.F-testThe distribution of F under the null has two sets of degrees of freedom•numerator degrees of freedom•denominator degrees of freedomThese correspond to the df as shown in the ANOVA table•numerator df = 1•denominator df = n-2Test is based on)2,1(~* nFMSEMSRFImplementing the F-testThe decision ruleIf F* > F(1-α; 1, n-2), then reject HoIf F* ≤ F(1-α; 1, n-2), then fail to reject HoANOVA for logLOS ~ BEDS> anova(reg)Analysis of Variance TableResponse: logLOS Df Sum Sq Mean Sq F value Pr(>F) BEDS 1 0.64017 0.64017 24.442 2.737e-06 ***Residuals 111 2.90728 0.02619 > qf(0.95, 1, 111)[1] 3.926607> 1-pf(24.44,1,111)[1] 2.739016e-06More interesting: MLRYou can test that several coefficients are zero at the same timeOtherwise, F-test gives the same result as a t-testThat is: for testing the significance of ONE covariate in a linear regression model, an F-test and a t-test give the same result:•H0: β1= 0•H1: β1≠ 0general F testing approachPrevious seems simpleIt is in this case, but can be generalized to be more usefulImagine more general test:•Ho: small model•Ha: large modelConstraint: the small model must be ‘nested’ in the large modelThat is, the small model must be a ‘subset’ of the large modelExample of ‘nested’ modelsiieNURS ENURSEMSINFRISKLOS 243210iieNURSENURSEINFRISKLOS 24310iieMSINFRISKLOS 210Model 1:Model 2:Model 3:Models 2 and 3 are nested in Model 1Model 2 is not nested in Model 3Model 3 is not nested in Model 2Testing: Models must be nested!To test Model 1 vs. Model 2•we are testing that β2 = 0•Ho: β2 = 0 vs. Ha: β2 ≠ 0•If β2 = 0 , then we conclude that Model 2 is superior to Model 1•That is, if we fail to reject the null hypothesisiieNURSENURSEMSINFRISKLOS 243210iieNURSENURSEINFRISKLOS 24310Model 2:Model 1:Rreg1 <- lm(LOS ~ INFRISK + ms + NURSE + nurse2, data=data)reg2 <- lm(LOS ~ INFRISK + NURSE + nurse2, data=data)reg3 <- lm(LOS ~ INFRISK + ms, data=data)> anova(reg1)Analysis of Variance TableResponse: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK 1 116.446 116.446 45.4043 8.115e-10 ***ms 1 12.897 12.897 5.0288 0.02697 * NURSE 1 1.097 1.097 0.4277 0.51449 nurse2 1 1.789 1.789 0.6976 0.40543 Residuals 108 276.981 2.565 ---R> anova(reg2)Analysis of Variance TableResponse: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK 1 116.446 116.446 44.8865 9.507e-10 ***NURSE 1 8.212 8.212 3.1653 0.078 . nurse2 1 1.782 1.782 0.6870 0.409 Residuals 109 282.771 2.594 ---> anova(reg1, reg2)Analysis of Variance TableModel 1: LOS ~ INFRISK + ms + NURSE + nurse2Model 2: LOS ~ INFRISK + NURSE + nurse2 Res.Df RSS Df Sum of Sq F Pr(>F)1 108 276.981 2 109 282.771 -1 -5.789 2.2574 0.1359R> summary(reg1)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.355e+00 5.266e-01 12.068 < 2e-16 ***INFRISK 6.289e-01 1.339e-01 4.696 7.86e-06 ***ms 7.829e-01 5.211e-01 1.502 0.136 NURSE 4.136e-03 4.093e-03 1.010 0.315 nurse2 -5.676e-06 6.796e-06
View Full Document