Unformatted text preview:

Lecture 13 Diagnostics in MLR Variance Inflation Factors Added variable plots Identifying outliersVariance Inflation Factor (VIF)How to calculate VIF?Sounds like a lot of work!XTXVIFsGetting the VIFs the old-fashioned wayMore practical way.What to do?More diagnostics: the added variable plotsExample: SENICR codeAdded Variable PlotWhat does that show?Slide 15Slide 16Residual PlotsWhich is better?Identifying Outliers“Hat” matrixMatrix Format for the MLR model“Transpose” and “Inverse”Estimating, based on fitted modelOther uses of HProperty of hij’sOther use of HUsing the Hat Matrix to identify outliersSlide 28Hat values versus indexIdentifying points with high hiiDoes a high hat mean it has a large residual?Let’s look at our MLR9Using the hat matrix in MLRDefining Studentized ResidualsDeleted ResidualsDeleted Residuals, diSlide 37Studentized Deleted ResidualsAnother nice resultTesting for outliersLecture 13Diagnostics in MLRVariance Inflation FactorsAdded variable plotsIdentifying outliersBMTRY 701Biostatistical Methods IIVariance Inflation Factor (VIF)Diagnostic for multicollinearityDescribes the amount of an X that is explained by the other X’s in the modelIf VIF is high, then it suggests that the covariate should not be added.Why?•it is redundant•it adds variance to the model•it creates ‘instability’ in the estimationHow to calculate VIF?Simple idea:That is, the VIF for the jth covariate is the coefficient of determination (R2) obtained from regressing xj on the remaining x’s in the model211jjRVIFexxxxxJJjjjjj1111110Sounds like a lot of work!You don’t actually have to estimate the regressions for each xj. Some matrix notation:•X = matrix of covariates including a column for the intercept•XT = transpose of X. That is, flip X on its diagonal•X-1 = the inverse of X. That is, what you multiply X by to get the identity matrix•I = the identity matrix. A matrix with 0’s on the off-diagonal and 1’s on the diagonalUseful matrix: XTX. (see chapter 3 for lots on it!)Another useful matrix: (XTX)-1XTXRecall what it means to standardize a variable:•subtract off the mean•divide by the standard deviationImagine that you standardize all of the variables in your model (x’s).Call the new covariate matrix WNow, if calculate WTW (and divide by n-1), it is the correlation matrixLastly, take the inverse of WTW (i.e., (WTW)-1)VIFsThe diagonals of the (WTW)-1 matrix are the VIFsThis is a natural by-product of the regressionThe (WTW)-1 matrix is estimated when the regression is estimatedRules of thumb:•VIF larger than 10 implies a serious multicollinearity problem•VIFs of 5 or greater suggest that coefficient estimates may be misleading due to multicollinearityGetting the VIFs the old-fashioned way# standardize variablesages <- (AGE-mean(AGE))/sqrt(var(AGE))censuss <- (CENSUS - mean(CENSUS))/sqrt(var(CENSUS))xrays <- (XRAY - mean(XRAY))/sqrt(var(XRAY))infrisks <- (INFRISK-mean(INFRISK))/sqrt(var(INFRISK))sqrtcults <- (sqrtCULT-mean(sqrtCULT))/sqrt(var(sqrtCULT))nurses <- (NURSE - mean(NURSE))/sqrt(var(NURSE))# create matrix of covariatesxmat <- data.frame(ages, censuss, xrays, infrisks, sqrtcults, nurses)xmat <- as.matrix(xmat)n <- nrow(xmat) # estimate x-transpose x and divide by n-1cormat <- t(xmat)%*%xmat/(n-1)# solve finds the inverse of a matrixvifmat <- solve(cormat)round(diag(vifmat), 2)More practical way.library(HH)mlr <- lm(logLOS ~ AGE + CENSUS + XRAY + INFRISK + sqrtCULT + NURSE)round(diag(vifmat), 2) ages censuss xrays infrisks sqrtcults nurses 1.10 5.88 1.39 2.01 1.92 5.94 vif(mlr) AGE CENSUS XRAY INFRISK sqrtCULT NURSE 1.096204 5.875625 1.390417 2.007692 1.916983 5.935711What to do?Unlikely that only one variable will have high VIFYou need to then determine which to include, which to removeJudgement should be based on science + statistics!More diagnostics: the added variable plotsThese can help check for adequacy of modelIs there curvature between Y and X after adjusting for the other X’s?“Refined” residual plotsThey show the marginal importance of an individual predictorHelp figure out a good form for the predictorExample: SENICRecall the difficulty determining the form for INFIRSK in our regression model.Last time, we settled on including one term, INFRISK^2But, we could do an adjusted variable plot approach.How?We want to know, adjusting for all else in the model, what is the right form for INFRISK?R codeav1 <- lm(logLOS ~ AGE + XRAY + CENSUS + factor(REGION) )av2 <- lm(INFRISK ~ AGE + XRAY + CENSUS + factor(REGION) )resy <- av1$residualsresx <- av2$residualsplot(resx, resy, pch=16)abline(lm(resy~resx), lwd=2)Added Variable Plot-2 -1 0 1 2 3-0.2 0.0 0.2 0.4resxresyWhat does that show?The relationship between logLOS and INFRISK if you added INFRISK to the regressionBut, is that what we want to see?How about looking at residuals versus INFRISK (before including INFRISK in the model)?R codemlr8 <- lm(logLOS ~ AGE + XRAY + CENSUS + factor(REGION))smoother <- lowess(INFRISK, mlr8$residuals)plot(INFRISK, mlr8$residuals)lines(smoother)2 3 4 5 6 7 8-0.2 0.0 0.2 0.4INFRISKmlr8$residualsR code> infrisk.star <- ifelse(INFRISK>4,INFRISK-4,0)> mlr9 <- lm(logLOS ~ INFRISK + infrisk.star + AGE + XRAY + > CENSUS + factor(REGION))> summary(mlr9)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.798e+00 1.667e-01 10.790 < 2e-16 ***INFRISK 1.836e-03 1.984e-02 0.093 0.926478 infrisk.star 6.795e-02 2.810e-02 2.418 0.017360 * AGE 5.554e-03 2.535e-03 2.191 0.030708 * XRAY 1.361e-03 6.562e-04 2.073 0.040604 * CENSUS 3.718e-04 7.913e-05 4.698 8.07e-06 ***factor(REGION)2 -7.182e-02 3.051e-02 -2.354 0.020452 * factor(REGION)3 -1.030e-01 3.036e-02 -3.391 0.000984 ***factor(REGION)4 -2.068e-01 3.784e-02 -5.465 3.19e-07 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1137 on 104 degrees of freedomMultiple R-Squared: 0.6209, Adjusted R-squared: 0.5917 F-statistic: 21.29 on 8 and 104 DF, p-value: < 2.2e-16Residual Plots2 3 4 5 6 7 8-0.2 -0.1 0.0 0.1 0.2 0.3 0.4INFRISKmlr9$residuals2 3 4 5 6 7 8-0.2 -0.1 0.0 0.1 0.2 0.3


View Full Document

MUSC BMTRY 701 - lect13

Documents in this Course
lect3

lect3

38 pages

lect9

lect9

28 pages

lect18

lect18

17 pages

lect1

lect1

51 pages

lect12

lect12

24 pages

lect7

lect7

38 pages

lect9

lect9

29 pages

lect11

lect11

25 pages

lect22

lect22

12 pages

lect10

lect10

40 pages

lect15

lect15

23 pages

lect14

lect14

47 pages

lect13

lect13

32 pages

lect12

lect12

24 pages

lecture18

lecture18

48 pages

lect17

lect17

29 pages

lect4

lect4

50 pages

lect4

lect4

48 pages

lect16

lect16

27 pages

lect8

lect8

20 pages

Load more
Download lect13
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lect13 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lect13 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?