Stat 401 B – Lecture 301Simple Linear Regression Example - mammals Response variable: gestation (length of pregnancy) days Explanatory: brain weight2“Man” Extreme negative residual but that residual is not statistically significant. The extreme brain weight of “man” creates high leverage that is statistically significant.3“Man” Is the point for “Man”influencing where the simple linear regression line is going? Is this influence statistically significant?Stat 401 B – Lecture 3040100200300400500Gestation0 500 1000 1500BrainWgt5Simple Linear Regression Predicted Gestation = 85.25 + 0.30*Brain Weight R2= 0.372, so only 37.2% of the variation in gestation is explained by the linear relationship with brain weight.6Exclude “Man” What happens to the simple linear regression line if we exclude “Man” from the data? Do the estimated intercept and estimated slope change?Stat 401 B – Lecture 3070100200300400500Gestation0 500 1000 1500BrainWgt8Simple Linear Regression Predicted Gestation = 62.05 + 0.634*Brain Weight R2= 0.600, 60% of the variation in gestation is explained by the linear relationship with brain weight.9Changes The estimated slope has more than doubled once “Man” is removed. The estimated intercept has decreased by over 20 days.Stat 401 B – Lecture 3010Influence It appears that the point associated with “Man”influences where the simple linear regression line goes. Is this influence statistically significant?11Influence Measures Quantifying influence involves how much the point differs in the response direction as well as in the explanatory direction. Combine information on the residual and the leverage.12Cook’s D where zis the standardized residual and pis the number of explanatory variables in the model.()211⎟⎟⎠⎞⎜⎜⎝⎛−⎟⎟⎠⎞⎜⎜⎝⎛+=hzphdStat 401 B – Lecture 3013Cook’s D If d> 1, then the point is considered to have high influence.14Cook’s D for “Man”()()23.186612.01516.226612.01122=⎟⎟⎠⎞⎜⎜⎝⎛−−⎟⎠⎞⎜⎝⎛=⎟⎟⎠⎞⎜⎜⎝⎛−⎟⎟⎠⎞⎜⎜⎝⎛+=ddhzphd15Cook’s D for “Man” Because the d value for “Man”is greater than 1, it is considered to exert high influence on where the regression line goes.Stat 401 B – Lecture 3016Cook’s D There are no other mammals with a value of dgreater than 1. The okapi has d= 0.30 The Brazilian Tapir has d = 0.1017Studentized Residuals The studentized residual is the standardized residual adjusted for the leverage.()hzrs−=118Studentized Residuals2.5520.08392.443Okapi–4.3230.6612–2.516“Man”3.0430.02173.010Brazilian TapirrshzStat 401 B – Lecture 3019Studentized Residuals If the conditions for the errors are met, then studentizedresiduals have an approximate t-distribution with degrees of freedom equal to n – p – 1.20Computing a P-value JMP –Col –Formula (1 – t Distribution(|rs|,n-p-1))*2 For our examplers= 3.043, n-p-1=48 P-value = 0.003821Studentized Residuals0.01392.5520.08392.443Okapi<0.0001–4.3230.6612–2.516“Man”0.00383.0430.02173.010Brazilian TapirP-valuershzStat 401 B – Lecture 3022Conclusion – “Man” The P-value is much less than 0.001 (the Bonferroni corrected cutoff), therefore “Man” has statistically significant influence on where the regression line is going.23Other Mammals The Brazilian Tapir has the most extreme standardized residual but not much leverage and so is not influential according to either Cook’s D or the Studentized Residual value.24Other Mammals The Okapi has high leverage, greater than 0.08, but it’s standardized residual is not that extreme and so is not influential according to either Cook’s D or the Studentized Residual
View Full Document