DOC PREVIEW
PSU STAT 501 - Outliers and influential data points

This preview shows page 1-2-3-4-5-32-33-34-35-64-65-66-67-68 out of 68 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Outliers and influential data pointsThe distinctionNo outliers? No influential data points?Any outliers? Any influential data points?Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Impact on regression analysesThe leverages hiiSlide 15Slide 16Properties of the leverages hiiAny high leverages hii?Slide 19Slide 20Slide 21Identifying data points whose x values are extreme .... and therefore potentially influentialUsing leverages to identify extreme x valuesSlide 24Slide 25Important distinction!Identifying outliers (unusual y values)Identifying outliersResidualsStandardized residualsSlide 31An outlier?Slide 33Why should we care? (Regression of y on x with outlier)Why should we care? (Regression of y on x without outlier)Identifying influential data pointsIdentifying influential data pointsBasic idea of these four measuresDeleted residualsSlide 40Deleted t residualsSlide 42Slide 43Slide 44Slide 45DFITSUsing DFITSSlide 48Slide 49Slide 50Slide 51Slide 52Slide 53Cook’s distanceEffect on estimates of removing each data point one at a time?Slide 56Slide 57Slide 58Using Cook’s distanceSlide 60Slide 61Slide 62Slide 63Slide 64Slide 65A strategy for dealing with problematic data pointsA comment about deleting data pointsA strategy for dealing with problematic data points (cont’d)Outliers and influential data pointsThe distinction•An outlier is a data point whose response y does not follow the general trend of the rest of the data.•A data point is influential if it unduly influences any part of a regression analysis, such as predicted responses, estimated slope coefficients, hypothesis test results, etc.No outliers? No influential data points?14121086420706050403020100xyAny outliers? Any influential data points?14121086420706050403020100xyAny outliers? Any influential data points?0 2 4 6 8 10 12 14010203040506070xyy = 1.73 + 5.12 xy = 2.96 + 5.04 xThe regression equation is y = 1.73 + 5.12 xPredictor Coef SE Coef T PConstant 1.732 1.121 1.55 0.140x 5.1169 0.2003 25.55 0.000S = 2.592 R-Sq = 97.3% R-Sq(adj) = 97.2%Without the blue data point:With the blue data point:The regression equation is y = 2.96 + 5.04 xPredictor Coef SE Coef T PConstant 2.958 2.009 1.47 0.157x 5.0373 0.3633 13.86 0.000S = 4.711 R-Sq = 91.0% R-Sq(adj) = 90.5%Any outliers? Any influential data points?14121086420706050403020100xyAny outliers? Any influential data points?14121086420706050403020100xyy = 1.73 + 5.12 xy = 2.47 + 4.93 xThe regression equation is y = 1.73 + 5.12 xPredictor Coef SE Coef T PConstant 1.732 1.121 1.55 0.140x 5.1169 0.2003 25.55 0.000S = 2.592 R-Sq = 97.3% R-Sq(adj) = 97.2%Without the blue data point:With the blue data point:The regression equation is y = 2.47 + 4.93 xPredictor Coef SE Coef T PConstant 2.468 1.076 2.29 0.033x 4.9272 0.1719 28.66 0.000S = 2.709 R-Sq = 97.7% R-Sq(adj) = 97.6%Any outliers? Any influential data points?14121086420706050403020100xyAny outliers? Any influential data points?14121086420706050403020100xyy = 1.73 + 5.12 xy = 8.51 + 3.32 xThe regression equation is y = 1.73 + 5.12 xPredictor Coef SE Coef T PConstant 1.732 1.121 1.55 0.140x 5.1169 0.2003 25.55 0.000S = 2.592 R-Sq = 97.3% R-Sq(adj) = 97.2%Without the blue data point:With the blue data point:The regression equation is y = 8.50 + 3.32 xPredictor Coef SE Coef T PConstant 8.505 4.222 2.01 0.058x 3.3198 0.6862 4.84 0.000S = 10.45 R-Sq = 55.2% R-Sq(adj) = 52.8%Impact on regression analyses•Not every outlier strongly influences the regression analysis.•Always determine if the regression analysis is unduly influenced by one or a few data points.•Simple plots for simple linear regression.•Summary measures for multiple linear regression.The leverages hiiThe leverages hiiThe predicted response can be written as a linear combination of the n observed values y1, y2, …, yn:niyhyhyhyhyniniiiiii,,1for ˆ2211 where the weights hi1, hi2, …, hii, …, hin depend only on the predictor values.For example:nnnnnnnnnnyhyhyhyyhyhyhyyhyhyhy22112222121212121111ˆˆˆThe leverages hiiBecause the predicted response can be written as:the leverage, hii, quantifies the influence that the observed response yi has on its predicted value .iyˆniyhyhyhyhyniniiiiii,,1for ˆ2211 Properties of the leverages hii•The leverage hii is:–a measure of the distance between the x value for the ith data point and the mean of the x values for all n data points.–a number between 0 and 1, inclusive.•The sum of the hii equals p, the number of parameters.Any high leverages hii?14121086420706050403020100xy0 1 2 3 4 5 6 7 8 9xDotplot for xsample mean = 4.751h(1,1) = 0.176 h(21,21) = 0.163h(11,11) = 0.048HI10.176297 0.157454 0.127014 0.119313 0.0861450.077744 0.065028 0.061276 0.050974 0.049628 0.048147 0.049313 0.051829 0.055760 0.069311 0.072580 0.109616 0.127489 0.140453 0.141136 0.163492 Sum of HI1 = 2.0000Any high leverages hii?14121086420706050403020100xy14121086420xDotplot for xsample mean = 5.227h(1,1) = 0.153 h(11,11) = 0.048 h(21,21) = 0.358HI10.153481 0.139367 0.116292 0.110382 0.0843740.077557 0.066879 0.063589 0.050033 0.052121 0.047632 0.048156 0.049557 0.055893 0.057574 0.078121 0.088549 0.096634 0.096227 0.110048 0.357535 Sum of HI1 = 2.0000Identifying data points whose x values are extreme .... and therefore potentially influentialUsing leverages to identify extreme x valuesMinitab flags any observations whose leverage value, hii, is more than 3 times larger than the mean leverage value….npnhhniii1…or if it’s greater than 0.99 (whichever is smallest).14121086420706050403020100xy286.021233 npUnusual ObservationsObs x y Fit SE Fit Residual St Resid21 14.0 68.00 71.449 1.620 -3.449 -1.59 XX denotes an observation whose X value gives it largeinfluence. x y HI1 14.00 68.00 0.35753514121086420706050403020100xy286.021233


View Full Document

PSU STAT 501 - Outliers and influential data points

Documents in this Course
VARIABLES

VARIABLES

33 pages

Load more
Download Outliers and influential data points
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Outliers and influential data points and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Outliers and influential data points 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?