DOC PREVIEW
UW-Madison STAT 333 - 333disc12

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

STAT 333 Discussion 12 Apr 24, 2013Review: Leverage points and Cook’s distance1. Leverage measures the distance of given point from all other points in terms of X values, which depends onY values. High leverage (> 0.5 or 2p/n) usually leads large influence.2. Cook’s Distance takes both leverage and residual into account. Large Cook’s distance (> 1 or 4/(n − p))indicates influential point.3. Useful R functions: xyplot() (in ‘lattice’ library)Review: Multicolinearity1. (Geometric point of view) small angle between predictor variables.2. Multicolinearity is not a big problem with prediction.3. Multicolinearity can be a problem for explanation.4. Algebraic signs might be contrary to scientific expectations.5. High correlation between X1and X2will lead to high correlation betweenˆb1andˆb2.6. Variance inflation factor (VIF ) quantifies the severity of multicollinearity in linear regression analysis.7. Useful R functions: confidenceEllipse() and vif() (in ‘car’ library)Example 1: State Public ExpendituresEX: Per capita state and local public expenditures ($)ECAB: Economic ability index, in which income, retail sales, and the value of output(manufactures, mineral, and agricultural) per capita are equally weighted.MET: Percentage of population living in standard metropolitan areasGROW: Percent change in population, 1950-1960WEST: Western state (1) or not (0)> expend = read.table("expend.txt", header=T)> summary(expend[,1:4])EX ECAB MET GROWMin. :183.0 Min. : 57.40 Min. : 0.00 Min. :-7.4001st Qu.:253.5 1st Qu.: 85.40 1st Qu.:24.10 1st Qu.: 6.975Median :285.5 Median : 95.30 Median :46.15 Median :14.050Mean :286.6 Mean : 96.75 Mean :46.17 Mean :18.7293rd Qu.:324.0 3rd Qu.:105.10 3rd Qu.:69.97 3rd Qu.:22.675Max. :454.0 Max. :205.00 Max. :86.50 Max. :77.8001. Perform a residual analysis of model ‘EX ~ ECAB+WEST’.1250 300 350 400 450 500−100 −50 0 50 100Fitted valuesResiduals●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Residuals vs Fitted42729●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●−2 −1 0 1 2−2 −1 0 1 2 3Theoretical QuantilesStandardized residualsNormal Q−Q47427250 300 350 400 450 5000.0 0.5 1.0 1.5Fitted valuesStandardized residuals●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Scale−Location474270.0 0.1 0.2 0.3 0.4 0.5−3 −2 −1 0 1 2 3LeverageStandardized residuals●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Cook's distance10.50.51Residuals vs Leverage47427> expend[47,]EX ECAB MET GROW YOUNG OLD WEST STATE47 421 205 74.2 77.8 25.6 6.4 1 NV2. Try the following commands to make scatter plots of EX versus ECAB that can differentiate western and easternstates.> library('lattice')> xyplot(EX~ECAB, groups=WEST, type=c("p","r"), data=expend, auto.key=TRUE)2ECABEX200250300350400450100 150 200●●●●●●●●●●●●●●●●●●●●●●●●01●> xyplot(EX~ECAB|factor(WEST), data=expend)ECABEX200250300350400450100 150 200●●●●●●●●●●●●●●●●●●●●●●●●0100 150 200●●●●●●●●●●●●●●●●●●●●●●●●1Example 2> library(car)> x1=c(1,2,3,4,5,6,7,8,9,10)> x2=c(-3.6,-4.6,2.8,1.1, 4.9,-3.3, 3.8,-3.7,-4.4,-0.9)> x3=c( 0.9, 1.6,2.3,4.1, 4.6, 4.6, 7.1, 7.6, 8.6, 8.3)3> x4=c( 2.9, 0.6,3.9,2.8, 6.2, 5.3, 5.7, 8.4, 5.9,10.0)> y =c( 5.3, 2.3,5.4,6.9,12.4,11.7,18.0,14.8,17.2,19.6)> test.data=data.frame(y,x1,x2,x3,x4)> cor(test.data)y x1 x2 x3 x4y 1.0000000 0.94250938 0.12625014 0.94570679 0.8781770x1 0.9425094 1.00000000 -0.03465518 0.98161740 0.8706267x2 0.1262501 -0.03465518 1.00000000 0.00563135 0.1115821x3 0.9457068 0.98161740 0.00563135 1.00000000 0.8247915x4 0.8781770 0.87062668 0.11158214 0.82479150 1.00000001. Regress y on x1 and x2. (cor(x1,x2)=-0.03)> out1=lm(y~x1+x2)> confint(out1)2.5 % 97.5 %(Intercept) -2.1971697 4.3740092x1 1.3772673 2.4339902x2 -0.1728236 0.7028941> confidenceEllipse(out1)1.2 1.6 2.0 2.4−0.2 0.2 0.6x1 coefficientx2 coefficient●42. Regress y on x1 and x3.> out2=lm(y~x1+x3)> confint(out2)2.5 % 97.5 %(Intercept) -2.512512 4.554371x1 -2.195410 3.761142x3 -1.978524 4.406414> confidenceEllipse(out2)−2 0 2 4−2 0 2 4x1 coefficientx3 coefficient●3. Regress y on all predictors.> out3=lm(y~x1+x2+x3+x4)Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 0.6606 1.5297 0.432 0.684x1 0.0952 1.6112 0.059 0.955x2 0.1572 0.2141 0.734 0.496x3 1.4628 1.4643 0.999 0.364x4 0.5860 0.5924 0.989 0.368Residual standard error: 2.12 on 5 degrees of freedomMultiple R-squared: 0.9326, Adjusted R-squared: 0.8787F-statistic: 17.3 on 4 and 5 DF, p-value: 0.003929> confidenceEllipse(out3, which=c(2,3))> confidenceEllipse(out3, which=c(2,4))5−4 −2 0 2 4 6−0.5 0.0 0.5x1 coefficientx2 coefficient●−4 −2 0 2 4 6−2 0 2 4 6x1 coefficientx3 coefficient●> vif(out3)x1 x2 x3 x447.657003 1.225286 34.261665 5.381620> out4=lm(y~x2+x3+x4)> vif(out4)x2 x3 x41.037129 3.203488


View Full Document

UW-Madison STAT 333 - 333disc12

Download 333disc12
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 333disc12 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 333disc12 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?