Unformatted text preview:

Lecture 13 Diagnostics in MLR Added variable plots Identifying outliers Variance Inflation FactorRecall the added variable plotsExample: SENICR codeAdded Variable PlotWhat does that show?Slide 7Slide 8Residual PlotsWhich is better?Identifying Outliers“Hat” matrixMatrix Format for the MLR model“Transpose” and “Inverse”Estimating, based on fitted modelOther uses of HProperty of hij’sOther use of HUsing the Hat Matrix to identify outliersSlide 20Hat values versus indexIdentifying points with high hiiDoes a high hat mean it has a large residual?Let’s look at our MLR9Using the hat matrix in MLRDefining Studentized ResidualsDeleted ResidualsDeleted Residuals, diSlide 29Studentized Deleted ResidualsAnother nice resultTesting for outliersLecture 13Diagnostics in MLRAdded variable plotsIdentifying outliersVariance Inflation FactorBMTRY 701Biostatistical Methods IIRecall the added variable plotsThese can help check for adequacy of modelIs there curvature between Y and X after adjusting for the other X’s?“Refined” residual plotsThey show the marginal importance of an individual predictorHelp figure out a good form for the predictorExample: SENICRecall the difficulty determining the form for INFIRSK in our regression model.Last time, we settled on including one term, INFRISK^2But, we could do an adjusted variable plot approach.How?We want to know, adjusting for all else in the model, what is the right form for INFRISK?R codeav1 <- lm(logLOS ~ AGE + XRAY + CENSUS + factor(REGION) )av2 <- lm(INFRISK ~ AGE + XRAY + CENSUS + factor(REGION) )resy <- av1$residualsresx <- av2$residualsplot(resx, resy, pch=16)abline(lm(resy~resx), lwd=2)Added Variable Plot-2 -1 0 1 2 3-0.2 0.0 0.2 0.4resxresyWhat does that show?The relationship between logLOS and INFRISK if you added INFRISK to the regressionBut, is that what we want to see?How about looking at residuals versus INFRISK (before including INFRISK in the model)?R codemlr8 <- lm(logLOS ~ AGE + XRAY + CENSUS + factor(REGION))smoother <- lowess(INFRISK, mlr8$residuals)plot(INFRISK, mlr8$residuals)lines(smoother)2 3 4 5 6 7 8-0.2 0.0 0.2 0.4INFRISKmlr8$residualsR code> infrisk.star <- ifelse(INFRISK>4,INFRISK-4,0)> mlr9 <- lm(logLOS ~ INFRISK + infrisk.star + AGE + XRAY + > CENSUS + factor(REGION))> summary(mlr9)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.798e+00 1.667e-01 10.790 < 2e-16 ***INFRISK 1.836e-03 1.984e-02 0.093 0.926478 infrisk.star 6.795e-02 2.810e-02 2.418 0.017360 * AGE 5.554e-03 2.535e-03 2.191 0.030708 * XRAY 1.361e-03 6.562e-04 2.073 0.040604 * CENSUS 3.718e-04 7.913e-05 4.698 8.07e-06 ***factor(REGION)2 -7.182e-02 3.051e-02 -2.354 0.020452 * factor(REGION)3 -1.030e-01 3.036e-02 -3.391 0.000984 ***factor(REGION)4 -2.068e-01 3.784e-02 -5.465 3.19e-07 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1137 on 104 degrees of freedomMultiple R-Squared: 0.6209, Adjusted R-squared: 0.5917 F-statistic: 21.29 on 8 and 104 DF, p-value: < 2.2e-16Residual Plots2 3 4 5 6 7 8-0.2 -0.1 0.0 0.1 0.2 0.3 0.4INFRISKmlr9$residuals2 3 4 5 6 7 8-0.2 -0.1 0.0 0.1 0.2 0.3 0.4INFRISKmlr7$residualsSPLINE FOR INFRISKINFRISK2Which is better?Cannot compare via ANOVA because they are not nested!But, we can compare statistics qualitativelyR-squared:•MLR7: 0.60•MLR9: 0.62Partial R-squared:•MLR7: 0.17•MLR9: 0.19Identifying OutliersHarder to do in the MLR setting than in the SLR setting.Recall two concepts that make outliers important: •Leverage is a function of the explanatory variable(s) alone and measures the potential for a data point to affect the model parameter estimates. •Influence is a measure of how much a data point actually does affect the estimated model.Leverage and influence both may be defined in terms of matrices“Hat” matrixWe must do some matrix stuff to understand thisSection 6.2 is MLR in matrix termsNotation for a MLR with p predictors and data on n patients.The data:nYYYY21~npnppXXXXXXX1221111111~More notation:THE MODEL:What are the dimensions of each?Matrix Format for the MLR modelneeee21p10eXY “Transpose” and “Inverse”X-transpose: X’ or XTX-inverse: X-1Hat matrix = HWhy is H important? It transforms Y’s to Yhat’s:')'(1XXXXHHYY ˆEstimating, based on fitted model)()(2HIMSEes Variance-Covariance Matrix of residuals:)1()(2iiihMSEes Variance of ith residual:MSEhhsesijijij )0()(22Covariance of ith and jth residual:Other uses of HYHIe )( I = identity matrix)()(22HIe Variance-Covariance Matrix of residuals:)1()(22iiihe Variance of ith residual:222)0()(ijijijhhe Covariance of ith and jth residual:Property of hij’s  ninjijijhh1 11This means that each row of H sums to 1And, that each column of H sums to 1Other use of HIdentifies points of leverage0 5 10 15-10 0 10 20 30 40xy1243Using the Hat Matrix to identify outliersLook at hii to see if a datapoint is an outlierLarge values of hii imply small values of var(ei)As hii gets close to 1, var(ei) approaches 0.Note that As hii approaches 1, yhat approaches yThis gives hii the name “leverage”HIGH HAT VALUE IMPLIES POTENTIAL FOR OUTLIER!jijijiiinjjijiyhyhyhy1ˆR codehat <- hatvalues(reg)plot(1:102, hat)highhat <- ifelse(hat>0.10,1,0)plot(x,y)points(x[highhat==1], y[highhat==1], col=2, pch=16, cex=1.5)Hat values versus index0 20 40 60 80 1000.02 0.06 0.10 0.141:102hatIdentifying points with high hii0 5 10 15-10 0 10 20 30 40xyDoes a high hat mean it has a large residual?No.hii measures leverage, not influenceRecall what hii is made of•it depends ONLY on the X’s•it does not depend on the actual Y valueLook back at the plot: which of these is probably most “influential”Standard cutoffs for “large” hii: •2p/n•0.5 very high, 0.2-0.5 highLet’s look at our MLR9Any outliers?0 20 40 60 80 1000.05 0.10 0.15 0.201:length(hat9)hat9Using the hat matrix in MLRStudentized


View Full Document

MUSC BMTRY 701 - lect13

Documents in this Course
lect3

lect3

38 pages

lect9

lect9

28 pages

lect18

lect18

17 pages

lect1

lect1

51 pages

lect12

lect12

24 pages

lect7

lect7

38 pages

lect9

lect9

29 pages

lect11

lect11

25 pages

lect13

lect13

40 pages

lect22

lect22

12 pages

lect10

lect10

40 pages

lect15

lect15

23 pages

lect14

lect14

47 pages

lect12

lect12

24 pages

lecture18

lecture18

48 pages

lect17

lect17

29 pages

lect4

lect4

50 pages

lect4

lect4

48 pages

lect16

lect16

27 pages

lect8

lect8

20 pages

Load more
Download lect13
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lect13 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lect13 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?