STAT 333 Discussion 12 Apr 24, 2013Review: Leverage points and Cook’s distance1. Leverage measures the distance of given point from all other points in terms of X values, which depends onY values. High leverage (> 0.5 or 2p/n) usually leads large influence.2. Cook’s Distance takes both leverage and residual into account. Large Cook’s distance (> 1 or 4/(n − p))indicates influential point.3. Useful R functions: xyplot() (in ‘lattice’ library)Review: Multicolinearity1. (Geometric point of view) small angle between predictor variables.2. Multicolinearity is not a big problem with prediction.3. Multicolinearity can be a problem for explanation.4. Algebraic signs might be contrary to scientific expectations.5. High correlation between X1and X2will lead to high correlation betweenˆb1andˆb2.6. Variance inflation factor (VIF ) quantifies the severity of multicollinearity in linear regression analysis.7. Useful R functions: confidenceEllipse() and vif() (in ‘car’ library)Example 1: State Public ExpendituresEX: Per capita state and local public expenditures ($)ECAB: Economic ability index, in which income, retail sales, and the value of output(manufactures, mineral, and agricultural) per capita are equally weighted.MET: Percentage of population living in standard metropolitan areasGROW: Percent change in population, 1950-1960WEST: Western state (1) or not (0)> expend = read.table("expend.txt", header=T)> summary(expend[,1:4])EX ECAB MET GROWMin. :183.0 Min. : 57.40 Min. : 0.00 Min. :-7.4001st Qu.:253.5 1st Qu.: 85.40 1st Qu.:24.10 1st Qu.: 6.975Median :285.5 Median : 95.30 Median :46.15 Median :14.050Mean :286.6 Mean : 96.75 Mean :46.17 Mean :18.7293rd Qu.:324.0 3rd Qu.:105.10 3rd Qu.:69.97 3rd Qu.:22.675Max. :454.0 Max. :205.00 Max. :86.50 Max. :77.8001. Perform a residual analysis of model ‘EX ~ ECAB+WEST’.1250 300 350 400 450 500−100 −50 0 50 100Fitted valuesResiduals●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Residuals vs Fitted42729●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●−2 −1 0 1 2−2 −1 0 1 2 3Theoretical QuantilesStandardized residualsNormal Q−Q47427250 300 350 400 450 5000.0 0.5 1.0 1.5Fitted valuesStandardized residuals●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Scale−Location474270.0 0.1 0.2 0.3 0.4 0.5−3 −2 −1 0 1 2 3LeverageStandardized residuals●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Cook's distance10.50.51Residuals vs Leverage47427> expend[47,]EX ECAB MET GROW YOUNG OLD WEST STATE47 421 205 74.2 77.8 25.6 6.4 1 NV2. Try the following commands to make scatter plots of EX versus ECAB that can differentiate western and easternstates.> library('lattice')> xyplot(EX~ECAB, groups=WEST, type=c("p","r"), data=expend, auto.key=TRUE)2ECABEX200250300350400450100 150 200●●●●●●●●●●●●●●●●●●●●●●●●01●> xyplot(EX~ECAB|factor(WEST), data=expend)ECABEX200250300350400450100 150 200●●●●●●●●●●●●●●●●●●●●●●●●0100 150 200●●●●●●●●●●●●●●●●●●●●●●●●1Example 2> library(car)> x1=c(1,2,3,4,5,6,7,8,9,10)> x2=c(-3.6,-4.6,2.8,1.1, 4.9,-3.3, 3.8,-3.7,-4.4,-0.9)> x3=c( 0.9, 1.6,2.3,4.1, 4.6, 4.6, 7.1, 7.6, 8.6, 8.3)3> x4=c( 2.9, 0.6,3.9,2.8, 6.2, 5.3, 5.7, 8.4, 5.9,10.0)> y =c( 5.3, 2.3,5.4,6.9,12.4,11.7,18.0,14.8,17.2,19.6)> test.data=data.frame(y,x1,x2,x3,x4)> cor(test.data)y x1 x2 x3 x4y 1.0000000 0.94250938 0.12625014 0.94570679 0.8781770x1 0.9425094 1.00000000 -0.03465518 0.98161740 0.8706267x2 0.1262501 -0.03465518 1.00000000 0.00563135 0.1115821x3 0.9457068 0.98161740 0.00563135 1.00000000 0.8247915x4 0.8781770 0.87062668 0.11158214 0.82479150 1.00000001. Regress y on x1 and x2. (cor(x1,x2)=-0.03)> out1=lm(y~x1+x2)> confint(out1)2.5 % 97.5 %(Intercept) -2.1971697 4.3740092x1 1.3772673 2.4339902x2 -0.1728236 0.7028941> confidenceEllipse(out1)1.2 1.6 2.0 2.4−0.2 0.2 0.6x1 coefficientx2 coefficient●42. Regress y on x1 and x3.> out2=lm(y~x1+x3)> confint(out2)2.5 % 97.5 %(Intercept) -2.512512 4.554371x1 -2.195410 3.761142x3 -1.978524 4.406414> confidenceEllipse(out2)−2 0 2 4−2 0 2 4x1 coefficientx3 coefficient●3. Regress y on all predictors.> out3=lm(y~x1+x2+x3+x4)Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 0.6606 1.5297 0.432 0.684x1 0.0952 1.6112 0.059 0.955x2 0.1572 0.2141 0.734 0.496x3 1.4628 1.4643 0.999 0.364x4 0.5860 0.5924 0.989 0.368Residual standard error: 2.12 on 5 degrees of freedomMultiple R-squared: 0.9326, Adjusted R-squared: 0.8787F-statistic: 17.3 on 4 and 5 DF, p-value: 0.003929> confidenceEllipse(out3, which=c(2,3))> confidenceEllipse(out3, which=c(2,4))5−4 −2 0 2 4 6−0.5 0.0 0.5x1 coefficientx2 coefficient●−4 −2 0 2 4 6−2 0 2 4 6x1 coefficientx3 coefficient●> vif(out3)x1 x2 x3 x447.657003 1.225286 34.261665 5.381620> out4=lm(y~x2+x3+x4)> vif(out4)x2 x3 x41.037129 3.203488
View Full Document