UW-Madison STAT 333 - 333disc12 - D21274

Home> Schools> University of Wisconsin, Madison> Statistics (STAT) > STAT 333> 333disc12

DOC PREVIEW

UW-Madison STAT 333 - 333disc12

School name University of Wisconsin, Madison

Course Stat 333- Applied Regression Analysis

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

STAT 333 Discussion 12 Apr 24, 2013Review: Leverage points and Cook’s distance1. Leverage measures the distance of given point from all other points in terms of X values, which depends onY values. High leverage (> 0.5 or 2p/n) usually leads large influence.2. Cook’s Distance takes both leverage and residual into account. Large Cook’s distance (> 1 or 4/(n − p))indicates influential point.3. Useful R functions: xyplot() (in ‘lattice’ library)Review: Multicolinearity1. (Geometric point of view) small angle between predictor variables.2. Multicolinearity is not a big problem with prediction.3. Multicolinearity can be a problem for explanation.4. Algebraic signs might be contrary to scientific expectations.5. High correlation between X1and X2will lead to high correlation betweenˆb1andˆb2.6. Variance inflation factor (VIF ) quantifies the severity of multicollinearity in linear regression analysis.7. Useful R functions: confidenceEllipse() and vif() (in ‘car’ library)Example 1: State Public ExpendituresEX: Per capita state and local public expenditures ($)ECAB: Economic ability index, in which income, retail sales, and the value of output(manufactures, mineral, and agricultural) per capita are equally weighted.MET: Percentage of population living in standard metropolitan areasGROW: Percent change in population, 1950-1960WEST: Western state (1) or not (0)> expend = read.table("expend.txt", header=T)> summary(expend[,1:4])EX ECAB MET GROWMin. :183.0 Min. : 57.40 Min. : 0.00 Min. :-7.4001st Qu.:253.5 1st Qu.: 85.40 1st Qu.:24.10 1st Qu.: 6.975Median :285.5 Median : 95.30 Median :46.15 Median :14.050Mean :286.6 Mean : 96.75 Mean :46.17 Mean :18.7293rd Qu.:324.0 3rd Qu.:105.10 3rd Qu.:69.97 3rd Qu.:22.675Max. :454.0 Max. :205.00 Max. :86.50 Max. :77.8001. Perform a residual analysis of model ‘EX ~ ECAB+WEST’.1250 300 350 400 450 500−100 −50 0 50 100Fitted valuesResiduals●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Residuals vs Fitted42729●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●−2 −1 0 1 2−2 −1 0 1 2 3Theoretical QuantilesStandardized residualsNormal Q−Q47427250 300 350 400 450 5000.0 0.5 1.0 1.5Fitted valuesStandardized residuals●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Scale−Location474270.0 0.1 0.2 0.3 0.4 0.5−3 −2 −1 0 1 2 3LeverageStandardized residuals●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Cook's distance10.50.51Residuals vs Leverage47427> expend[47,]EX ECAB MET GROW YOUNG OLD WEST STATE47 421 205 74.2 77.8 25.6 6.4 1 NV2. Try the following commands to make scatter plots of EX versus ECAB that can differentiate western and easternstates.> library('lattice')> xyplot(EX~ECAB, groups=WEST, type=c("p","r"), data=expend, auto.key=TRUE)2ECABEX200250300350400450100 150 200●●●●●●●●●●●●●●●●●●●●●●●●01●> xyplot(EX~ECAB|factor(WEST), data=expend)ECABEX200250300350400450100 150 200●●●●●●●●●●●●●●●●●●●●●●●●0100 150 200●●●●●●●●●●●●●●●●●●●●●●●●1Example 2> library(car)> x1=c(1,2,3,4,5,6,7,8,9,10)> x2=c(-3.6,-4.6,2.8,1.1, 4.9,-3.3, 3.8,-3.7,-4.4,-0.9)> x3=c( 0.9, 1.6,2.3,4.1, 4.6, 4.6, 7.1, 7.6, 8.6, 8.3)3> x4=c( 2.9, 0.6,3.9,2.8, 6.2, 5.3, 5.7, 8.4, 5.9,10.0)> y =c( 5.3, 2.3,5.4,6.9,12.4,11.7,18.0,14.8,17.2,19.6)> test.data=data.frame(y,x1,x2,x3,x4)> cor(test.data)y x1 x2 x3 x4y 1.0000000 0.94250938 0.12625014 0.94570679 0.8781770x1 0.9425094 1.00000000 -0.03465518 0.98161740 0.8706267x2 0.1262501 -0.03465518 1.00000000 0.00563135 0.1115821x3 0.9457068 0.98161740 0.00563135 1.00000000 0.8247915x4 0.8781770 0.87062668 0.11158214 0.82479150 1.00000001. Regress y on x1 and x2. (cor(x1,x2)=-0.03)> out1=lm(y~x1+x2)> confint(out1)2.5 % 97.5 %(Intercept) -2.1971697 4.3740092x1 1.3772673 2.4339902x2 -0.1728236 0.7028941> confidenceEllipse(out1)1.2 1.6 2.0 2.4−0.2 0.2 0.6x1 coefficientx2 coefficient●42. Regress y on x1 and x3.> out2=lm(y~x1+x3)> confint(out2)2.5 % 97.5 %(Intercept) -2.512512 4.554371x1 -2.195410 3.761142x3 -1.978524 4.406414> confidenceEllipse(out2)−2 0 2 4−2 0 2 4x1 coefficientx3 coefficient●3. Regress y on all predictors.> out3=lm(y~x1+x2+x3+x4)Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 0.6606 1.5297 0.432 0.684x1 0.0952 1.6112 0.059 0.955x2 0.1572 0.2141 0.734 0.496x3 1.4628 1.4643 0.999 0.364x4 0.5860 0.5924 0.989 0.368Residual standard error: 2.12 on 5 degrees of freedomMultiple R-squared: 0.9326, Adjusted R-squared: 0.8787F-statistic: 17.3 on 4 and 5 DF, p-value: 0.003929> confidenceEllipse(out3, which=c(2,3))> confidenceEllipse(out3, which=c(2,4))5−4 −2 0 2 4 6−0.5 0.0 0.5x1 coefficientx2 coefficient●−4 −2 0 2 4 6−2 0 2 4 6x1 coefficientx3 coefficient●> vif(out3)x1 x2 x3 x447.657003 1.225286 34.261665 5.381620> out4=lm(y~x2+x3+x4)> vif(out4)x2 x3 x41.037129 3.203488

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

UW-Madison STAT 333 - 333disc12

Sign up for free to view:

Please select your school