DOC PREVIEW
ISU STAT 401 - Lecture 28

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Stat 401 B – Lecture 281Outliers How do we determine if a potential outlier identified on the box plot is statistically significant?2Unusual Points in Regression Outlier – a point with an unusually large residual. High leverage point – a point with an extreme value for one, or more, of the explanatory variables3Influential Points Does a point influence where the regression line goes? An outlier can. A high leverage point can. Is that point statistically significant in terms of influence?Stat 401 B – Lecture 284Simple Linear Regression Example - mammals Response variable: gestation (length of pregnancy) days Explanatory: brain weight551015Count0 50 100 200 300 400 500GestationDistributions6Gestation (days) Skewed to the right. Several potential outliers. Mean = 117.4 days Median = 65.5 days Values from 12 days to 440 days.Stat 401 B – Lecture 2875101520253035Count0 500 1000 1500BrainWgtDistributions8Brain Weight Highly skewed to the right with several mounds. Six potential outliers. Mean = 107.25 g Median = 16.3 g Values from 0.14 g to 1320 g9Simple Linear Regression Trying to explain variation in the response (gestation) by relating the response to the explanatory variable (brain weight).Stat 401 B – Lecture 2810Regression Residuals Those observations that do not follow the general trend will have residuals that are far from zero, either positive or negative.yyˆ residual −=11Regression Outlier A residual far from zero, either negative or positive, will be called an outlier for regression. An outlier for regression corresponds to a value of the response that does not match the overall trend.12Simple Linear Regression Predicted Gestation = 85.25 + 0.30*Brain Weight R2= 0.372, so only 37.2% of the variation in gestation is explained by the linear relationship with brain weight.Stat 401 B – Lecture 2813Simple Linear Regression The model is useful. F = 28.49, P-value < 0.0001 This also indicates that there is a statistically significant linear relationship between brain weight and gestation.14-300-200-1000100200300Residual0 500 1000 1500BrainWgt15Unusual Points The mammal with a brain weight around 1300 g has the residual furthest from zero on the negative side. There are other mammals with residuals of the same magnitude on the positive side.Stat 401 B – Lecture 2816Outlier Box Plot Start with five number summary Minimum = –214.1 25% Quartile = –57.9 50% Median = –31.1 75% Quartile = 36.7 Maximum = 256.117InterQuartile Range (IQR) IQR = 75% Quart – 25% Quart  IQR = 36.7 – (– 57.9) = 94.6 Upper = 75% Quart + 1.5*IQR Upper = 36.7 + 141.9 = 178.6 Lower =25% Quart – 1.5*IQR Upper = – 57.9 – 141.9 = – 199.818Outlier Box Plot Any point above the Upper or below the Lower will be flagged as a potential outlier. Lines extend to the most extreme points inside the Lower and Upper bounds.Stat 401 B – Lecture 281951015Count-300 -200 -100 0 100 200 300Residual GestationDistributions20Regression Outliers207.8232.2440 days490 gOkapi–214.1481.1267 days1320 g“Man”256.1135.9392 days169 gBrazilian TapirResidPredGestationBrain Weight21Comments The residual for “Man” is not the most extreme. The residual for the Brazilian Tapir is the furthest from zero. Are any of these residuals statistically significant?Stat 401 B – Lecture 2822Standardized Residual A standardized residual should follow a standard normal distribution.RMSEresidual=z23Computing a P-value JMP –Col –Formula (1 – Normal Distribution(|z|))*2 Where |z| is the absolute value of z.24Standardized Residual0.01462.44207.8Okapi0.0119–2.52–214.1“Man”0.00263.01256.1Brazilian TapirP-valuezResidualStat 401 B – Lecture 2825Caution We are essentially doing 50 tests of hypothesis. If each test has a chance of error of 5%, then I would expect to see some P-values less than 0.05 just by chance. 26Bonferroni Correction Adjust what is a small P-value. If a P-value is less than 0.001, then the standardized residual is statistically significant.001.05005.0residuals of #05.0==27Conclusion Although some of the residuals were flagged on the outlier box plot, none were deemed statistically significant once we corrected for doing 50 simultaneous


View Full Document

ISU STAT 401 - Lecture 28

Download Lecture 28
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 28 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 28 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?