PSU STAT 501 - Outliers and influential data points - D160773

Home> Schools> Penn State University> Statistics (STAT) > STAT 501> Outliers and influential data points

DOC PREVIEW

PSU STAT 501 - Outliers and influential data points

School name Penn State University

Course Stat 501- Regression Methods

Pages 43

This preview shows page 1-2-3-20-21-22-41-42-43 out of 43 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Outliers and influential data pointsNo outliers?An outlier? Influential?Slide 4Slide 5Slide 6Slide 7Slide 8Impact on regression analysesThe hat matrix HSlide 11Slide 12Slide 13Identifying outlying Y valuesSlide 15ResidualsVariance of the residualsStandardized residualsAn outlying y value?Slide 20Deleted residualsDeleted t residualsSlide 23Slide 24Identifying outlying X valuesSlide 26Properties of the leverages (hii)Slide 28Slide 29Using leverages to identify outlying X valuesSlide 31Slide 32Identifying influential casesInfluenceSlide 35DFITSSlide 37Slide 38Slide 39Cook’s distanceSlide 41Slide 42Slide 43Outliers and influential data pointsNo outliers?14121086420706050403020100xyAn outlier? Influential?14121086420706050403020100xyAn outlier? Influential?14121086420706050403020100xyy = 1.73 + 5.12 xy = 2.96 + 5.04 xAn outlier? Influential?14121086420706050403020100xyAn outlier? Influential?14121086420706050403020100xyy = 1.73 + 5.12 xy = 2.47 + 4.93 xAn outlier? Influential?14121086420706050403020100xyAn outlier? Influential?14121086420706050403020100xyy = 1.73 + 5.12 xy = 8.51 + 3.32 xImpact on regression analyses•Not every outlier strongly influences the estimated regression function.•Always determine if estimated regression function is unduly influenced by one or a few cases.•Simple plots for simple linear regression.•Summary measures for multiple linear regression.The hat matrix HThe hat matrix HLeast squares estimates yXXXb'1'The regression modelXY XYE Fitted values yXXXXXby'1'ˆHyy ˆ7101584321yyyyy8.2315.3315.65.6142.4111112414231322122111xxxxxxxxX 664.0044.0152.0444.0044.0994.0979.1058.0152.0979.1931.0202.0444.0058.0202.0411.0'1'XXXXH36.608.1071.1485.8710158664.0044.0152.0444.0044.0994.0979.1058.0152.0979.1931.0202.0444.0058.0202.0411.0ˆHyy44434241343332312423222114131211hhhhhhhhhhhhhhhhH444343242141434333232131424323222121414313212111432144434241343332312423222114131211ˆyhyhyhyhyhyhyhyhyhyhyhyhyhyhyhyhyyyyhhhhhhhhhhhhhhhhHyy4321yyyyyIdentifying outlying Y valuesIdentifying outlying Y values•Residuals•Standardized residuals–also called internally studentized residuals•Deleted residuals•Deleted t residuals–also called studentized deleted residuals–also called externally studentized residualsResidualsiiiyyeˆOrdinary residuals defined for each observation, i = 1, …, n:Using matrix notation: yXXXXyyye'1'ˆ yHIHyye Variance of the residuals yHIHyye    HIeVar 2   iiiheVar  12Residual vectorVariance matrixVariance of the ith residualEstimated variance of the ith residual   iiihMSEes  1Standardized residuals  iiiiiihMSEeesee1*Standardized residuals defined for each observation, i = 1, …, n:Standardized residuals quantify how large the residuals are in standard deviation units. Standardized residuals larger than 2 or smaller than -2 suggest that the y values are unusual.An outlying y value?14121086420706050403020100xyx y FITS1 HI1 s(e) RESI1 SRES10.10000 -0.0716 3.4614 0.176297 4.27561 -3.5330 -0.826350.45401 4.1673 5.2446 0.157454 4.32424 -1.0774 -0.249161.09765 6.5703 8.4869 0.127014 4.40166 -1.9166 -0.435441.27936 13.8150 9.4022 0.119313 4.42103 4.4128 0.998182.20611 11.4501 14.0706 0.086145 4.50352 -2.6205 -0.58191...8.70156 46.5475 46.7904 0.140453 4.36765 -0.2429 -0.055619.16463 45.7762 49.1230 0.163492 4.30872 -3.3468 -0.776794.00000 40.0000 23.1070 0.050974 4.58936 16.8930 3.68110S = 4.711Unusual ObservationsObs x y Fit SE Fit Residual St Resid21 4.00 40.00 23.11 1.06 16.89 3.68R R denotes an observation with a large standardized residualDeleted residualsIf observed yi is extreme, it may “pull” the fitted equation towards itself, thereby yielding a small ordinary residual.Delete the ith case, estimate the regression function using remaining n-1 cases, and use the x values to predict the response for the ith case.Deleted residual)(ˆiiiiyyd Deleted t residualsA deleted t residual is just a standardized deleted residual: iiiiiihMSEddsdt1)(The deleted t residuals follow a t distribution with ((n-1)-p) degrees of freedom.109876543210151050xyy = 0.6 + 1.55 xy = 3.82 - 0.13 x x y RESI1 TRES1 1 2.1 -1.59 -1.7431 2 3.8 0.24 0.1217 3 5.2 1.77 1.6361 10 2.1 -0.42 -19.799014121086420706050403020100xyy = 1.73 + 5.12 xy = 2.96 + 5.04 x Row x y RESI1 SRES1 TRES1 1 0.10000 -0.0716 -3.5330 -0.82635 -0.81916 2 0.45401 4.1673 -1.0774 -0.24916 -0.24291 3 1.09765 6.5703 -1.9166 -0.43544 -0.42596 ... 19 8.70156 46.5475 -0.2429 -0.05561 -0.05413 20 9.16463 45.7762 -3.3468 -0.77679 -0.76837 21 4.00000 40.0000 16.8930 3.68110 6.69012Identifying outlying X valuesIdentifying outlying X values•Use the diagonal elements, hii, of the hat matrix H to identify outlying X values.•The hii are called leverages.Properties of the leverages (hii)•The hii is a measure of the distance between the X values for the ith case and the means of the X values for all n cases.•The hii is a number between 0 and 1, inclusive.•The sum of the hii equals p, the number of parameters.0 1 2 3 4 5 6 7 8 9xDotplot for xsample mean = 4.751h(11) = 0.176 h(20,20) = 0.163h(11,11) = 0.048HI1 0.176297 0.157454 0.127014 0.119313 0.086145 0.077744 0.065028 0.061276 0.048147 0.049628 0.049313 0.051829 0.055760 0.069311 0.072580 0.109616 0.127489 0.141136 0.140453 0.163492 0.050974 Sum of HI1 =

View Full Document