MUSC BMTRY 701 - lect13 - D1683591

Home> Schools> Medical University of South Carolina> (BMTRY) > BMTRY 701> lect13

MUSC BMTRY 701 - lect13

School name Medical University of South Carolina

Course Bmtry 701- Biostatistical Methods II

Pages 32

Download Save

Unformatted text preview:

Lecture 13 Diagnostics in MLR Added variable plots Identifying outliers Variance Inflation FactorRecall the added variable plotsExample: SENICR codeAdded Variable PlotWhat does that show?Slide 7Slide 8Residual PlotsWhich is better?Identifying Outliers“Hat” matrixMatrix Format for the MLR model“Transpose” and “Inverse”Estimating, based on fitted modelOther uses of HProperty of hij’sOther use of HUsing the Hat Matrix to identify outliersSlide 20Hat values versus indexIdentifying points with high hiiDoes a high hat mean it has a large residual?Let’s look at our MLR9Using the hat matrix in MLRDefining Studentized ResidualsDeleted ResidualsDeleted Residuals, diSlide 29Studentized Deleted ResidualsAnother nice resultTesting for outliersLecture 13Diagnostics in MLRAdded variable plotsIdentifying outliersVariance Inflation FactorBMTRY 701Biostatistical Methods IIRecall the added variable plotsThese can help check for adequacy of modelIs there curvature between Y and X after adjusting for the other X’s?“Refined” residual plotsThey show the marginal importance of an individual predictorHelp figure out a good form for the predictorExample: SENICRecall the difficulty determining the form for INFIRSK in our regression model.Last time, we settled on including one term, INFRISK^2But, we could do an adjusted variable plot approach.How?We want to know, adjusting for all else in the model, what is the right form for INFRISK?R codeav1 <- lm(logLOS ~ AGE + XRAY + CENSUS + factor(REGION) )av2 <- lm(INFRISK ~ AGE + XRAY + CENSUS + factor(REGION) )resy <- av1$residualsresx <- av2$residualsplot(resx, resy, pch=16)abline(lm(resy~resx), lwd=2)Added Variable Plot-2 -1 0 1 2 3-0.2 0.0 0.2 0.4resxresyWhat does that show?The relationship between logLOS and INFRISK if you added INFRISK to the regressionBut, is that what we want to see?How about looking at residuals versus INFRISK (before including INFRISK in the model)?R codemlr8 <- lm(logLOS ~ AGE + XRAY + CENSUS + factor(REGION))smoother <- lowess(INFRISK, mlr8$residuals)plot(INFRISK, mlr8$residuals)lines(smoother)2 3 4 5 6 7 8-0.2 0.0 0.2 0.4INFRISKmlr8$residualsR code> infrisk.star <- ifelse(INFRISK>4,INFRISK-4,0)> mlr9 <- lm(logLOS ~ INFRISK + infrisk.star + AGE + XRAY + > CENSUS + factor(REGION))> summary(mlr9)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.798e+00 1.667e-01 10.790 < 2e-16 ***INFRISK 1.836e-03 1.984e-02 0.093 0.926478 infrisk.star 6.795e-02 2.810e-02 2.418 0.017360 * AGE 5.554e-03 2.535e-03 2.191 0.030708 * XRAY 1.361e-03 6.562e-04 2.073 0.040604 * CENSUS 3.718e-04 7.913e-05 4.698 8.07e-06 ***factor(REGION)2 -7.182e-02 3.051e-02 -2.354 0.020452 * factor(REGION)3 -1.030e-01 3.036e-02 -3.391 0.000984 ***factor(REGION)4 -2.068e-01 3.784e-02 -5.465 3.19e-07 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1137 on 104 degrees of freedomMultiple R-Squared: 0.6209, Adjusted R-squared: 0.5917 F-statistic: 21.29 on 8 and 104 DF, p-value: < 2.2e-16Residual Plots2 3 4 5 6 7 8-0.2 -0.1 0.0 0.1 0.2 0.3 0.4INFRISKmlr9$residuals2 3 4 5 6 7 8-0.2 -0.1 0.0 0.1 0.2 0.3 0.4INFRISKmlr7$residualsSPLINE FOR INFRISKINFRISK2Which is better?Cannot compare via ANOVA because they are not nested!But, we can compare statistics qualitativelyR-squared:•MLR7: 0.60•MLR9: 0.62Partial R-squared:•MLR7: 0.17•MLR9: 0.19Identifying OutliersHarder to do in the MLR setting than in the SLR setting.Recall two concepts that make outliers important: •Leverage is a function of the explanatory variable(s) alone and measures the potential for a data point to affect the model parameter estimates. •Influence is a measure of how much a data point actually does affect the estimated model.Leverage and influence both may be defined in terms of matrices“Hat” matrixWe must do some matrix stuff to understand thisSection 6.2 is MLR in matrix termsNotation for a MLR with p predictors and data on n patients.The data:nYYYY21~npnppXXXXXXX1221111111~More notation:THE MODEL:What are the dimensions of each?Matrix Format for the MLR modelneeee21p10eXY “Transpose” and “Inverse”X-transpose: X’ or XTX-inverse: X-1Hat matrix = HWhy is H important? It transforms Y’s to Yhat’s:')'(1XXXXHHYY ˆEstimating, based on fitted model)()(2HIMSEes Variance-Covariance Matrix of residuals:)1()(2iiihMSEes Variance of ith residual:MSEhhsesijijij )0()(22Covariance of ith and jth residual:Other uses of HYHIe )( I = identity matrix)()(22HIe Variance-Covariance Matrix of residuals:)1()(22iiihe Variance of ith residual:222)0()(ijijijhhe Covariance of ith and jth residual:Property of hij’s  ninjijijhh1 11This means that each row of H sums to 1And, that each column of H sums to 1Other use of HIdentifies points of leverage0 5 10 15-10 0 10 20 30 40xy1243Using the Hat Matrix to identify outliersLook at hii to see if a datapoint is an outlierLarge values of hii imply small values of var(ei)As hii gets close to 1, var(ei) approaches 0.Note that As hii approaches 1, yhat approaches yThis gives hii the name “leverage”HIGH HAT VALUE IMPLIES POTENTIAL FOR OUTLIER!jijijiiinjjijiyhyhyhy1ˆR codehat <- hatvalues(reg)plot(1:102, hat)highhat <- ifelse(hat>0.10,1,0)plot(x,y)points(x[highhat==1], y[highhat==1], col=2, pch=16, cex=1.5)Hat values versus index0 20 40 60 80 1000.02 0.06 0.10 0.141:102hatIdentifying points with high hii0 5 10 15-10 0 10 20 30 40xyDoes a high hat mean it has a large residual?No.hii measures leverage, not influenceRecall what hii is made of•it depends ONLY on the X’s•it does not depend on the actual Y valueLook back at the plot: which of these is probably most “influential”Standard cutoffs for “large” hii: •2p/n•0.5 very high, 0.2-0.5 highLet’s look at our MLR9Any outliers?0 20 40 60 80 1000.05 0.10 0.15 0.201:length(hat9)hat9Using the hat matrix in MLRStudentized

View Full Document


School:
Email:
New Password:
Confirm Password:

MUSC BMTRY 701 - lect13

Sign up for free to view:

Please select your school