Outliers and influential data pointsNo outliers?An outlier? Influential?Slide 4Slide 5Slide 6Slide 7Slide 8Impact on regression analysesThe hat matrix HSlide 11Slide 12Slide 13Identifying outlying Y valuesSlide 15ResidualsVariance of the residualsStandardized residualsAn outlying y value?Slide 20Deleted residualsDeleted t residualsSlide 23Slide 24Identifying outlying X valuesSlide 26Properties of the leverages (hii)Slide 28Slide 29Using leverages to identify outlying X valuesSlide 31Slide 32Identifying influential casesInfluenceSlide 35DFITSSlide 37Slide 38Slide 39Cook’s distanceSlide 41Slide 42Slide 43Outliers and influential data pointsNo outliers?14121086420706050403020100xyAn outlier? Influential?14121086420706050403020100xyAn outlier? Influential?14121086420706050403020100xyy = 1.73 + 5.12 xy = 2.96 + 5.04 xAn outlier? Influential?14121086420706050403020100xyAn outlier? Influential?14121086420706050403020100xyy = 1.73 + 5.12 xy = 2.47 + 4.93 xAn outlier? Influential?14121086420706050403020100xyAn outlier? Influential?14121086420706050403020100xyy = 1.73 + 5.12 xy = 8.51 + 3.32 xImpact on regression analyses•Not every outlier strongly influences the estimated regression function.•Always determine if estimated regression function is unduly influenced by one or a few cases.•Simple plots for simple linear regression.•Summary measures for multiple linear regression.The hat matrix HThe hat matrix HLeast squares estimates yXXXb'1'The regression modelXY XYE Fitted values yXXXXXby'1'ˆHyy ˆ7101584321yyyyy8.2315.3315.65.6142.4111112414231322122111xxxxxxxxX 664.0044.0152.0444.0044.0994.0979.1058.0152.0979.1931.0202.0444.0058.0202.0411.0'1'XXXXH36.608.1071.1485.8710158664.0044.0152.0444.0044.0994.0979.1058.0152.0979.1931.0202.0444.0058.0202.0411.0ˆHyy44434241343332312423222114131211hhhhhhhhhhhhhhhhH444343242141434333232131424323222121414313212111432144434241343332312423222114131211ˆyhyhyhyhyhyhyhyhyhyhyhyhyhyhyhyhyyyyhhhhhhhhhhhhhhhhHyy4321yyyyyIdentifying outlying Y valuesIdentifying outlying Y values•Residuals•Standardized residuals–also called internally studentized residuals•Deleted residuals•Deleted t residuals–also called studentized deleted residuals–also called externally studentized residualsResidualsiiiyyeˆOrdinary residuals defined for each observation, i = 1, …, n:Using matrix notation: yXXXXyyye'1'ˆ yHIHyye Variance of the residuals yHIHyye HIeVar 2 iiiheVar 12Residual vectorVariance matrixVariance of the ith residualEstimated variance of the ith residual iiihMSEes 1Standardized residuals iiiiiihMSEeesee1*Standardized residuals defined for each observation, i = 1, …, n:Standardized residuals quantify how large the residuals are in standard deviation units. Standardized residuals larger than 2 or smaller than -2 suggest that the y values are unusual.An outlying y value?14121086420706050403020100xyx y FITS1 HI1 s(e) RESI1 SRES10.10000 -0.0716 3.4614 0.176297 4.27561 -3.5330 -0.826350.45401 4.1673 5.2446 0.157454 4.32424 -1.0774 -0.249161.09765 6.5703 8.4869 0.127014 4.40166 -1.9166 -0.435441.27936 13.8150 9.4022 0.119313 4.42103 4.4128 0.998182.20611 11.4501 14.0706 0.086145 4.50352 -2.6205 -0.58191...8.70156 46.5475 46.7904 0.140453 4.36765 -0.2429 -0.055619.16463 45.7762 49.1230 0.163492 4.30872 -3.3468 -0.776794.00000 40.0000 23.1070 0.050974 4.58936 16.8930 3.68110S = 4.711Unusual ObservationsObs x y Fit SE Fit Residual St Resid21 4.00 40.00 23.11 1.06 16.89 3.68R R denotes an observation with a large standardized residualDeleted residualsIf observed yi is extreme, it may “pull” the fitted equation towards itself, thereby yielding a small ordinary residual.Delete the ith case, estimate the regression function using remaining n-1 cases, and use the x values to predict the response for the ith case.Deleted residual)(ˆiiiiyyd Deleted t residualsA deleted t residual is just a standardized deleted residual: iiiiiihMSEddsdt1)(The deleted t residuals follow a t distribution with ((n-1)-p) degrees of freedom.109876543210151050xyy = 0.6 + 1.55 xy = 3.82 - 0.13 x x y RESI1 TRES1 1 2.1 -1.59 -1.7431 2 3.8 0.24 0.1217 3 5.2 1.77 1.6361 10 2.1 -0.42 -19.799014121086420706050403020100xyy = 1.73 + 5.12 xy = 2.96 + 5.04 x Row x y RESI1 SRES1 TRES1 1 0.10000 -0.0716 -3.5330 -0.82635 -0.81916 2 0.45401 4.1673 -1.0774 -0.24916 -0.24291 3 1.09765 6.5703 -1.9166 -0.43544 -0.42596 ... 19 8.70156 46.5475 -0.2429 -0.05561 -0.05413 20 9.16463 45.7762 -3.3468 -0.77679 -0.76837 21 4.00000 40.0000 16.8930 3.68110 6.69012Identifying outlying X valuesIdentifying outlying X values•Use the diagonal elements, hii, of the hat matrix H to identify outlying X values.•The hii are called leverages.Properties of the leverages (hii)•The hii is a measure of the distance between the X values for the ith case and the means of the X values for all n cases.•The hii is a number between 0 and 1, inclusive.•The sum of the hii equals p, the number of parameters.0 1 2 3 4 5 6 7 8 9xDotplot for xsample mean = 4.751h(11) = 0.176 h(20,20) = 0.163h(11,11) = 0.048HI1 0.176297 0.157454 0.127014 0.119313 0.086145 0.077744 0.065028 0.061276 0.048147 0.049628 0.049313 0.051829 0.055760 0.069311 0.072580 0.109616 0.127489 0.141136 0.140453 0.163492 0.050974 Sum of HI1 =
View Full Document