AMS 578 Homework 3 solution Problem 1: a. > setwd("C:/spring 2009 TA/hw3"); > file<-"CH06PR05.txt"; > data<-read.table(file,header=FALSE, col.names=c("Y","X1","X2")); > attach(data); > X<-as.matrix(cbind(1,X1,X2)); > Y<-as.matrix(Y); > n<-length(Y);p<-dim(X)[2]; > Xs<-t(X)%*%X; > H<-X%*%solve(Xs)%*%t(X); > b<-solve(Xs)%*%t(X)%*%Y;b; [,1] 37.650 X1 4.425 X2 4.375 So the estimated regression function is Y=37.650+4.425X1+4.375X2. The b1 represents the effect of moisture content on the product’s rating. Increasing the moisture by one unit increases the rating by 4.425 units. b. > Yhat<-X%*%b; > resid<-Y-Yhat; > X1X2<-X1*X2 > par(mar<-c(4,4,1,1)); > plot(resid~Yhat,pch=20);Residual plot against fitted values provides information on appropriateness of our model, constancy of error terms. In this plot, no significant outliers and the error appear to be constant. > plot(resid~X1,pch=20); > plot(resid~X2,pch=20); Residual plot against predictor variables test the appropriateness of our model with respect to specific variable. In this plot, no significant systematic tendencies are observed. > plot(resid~X1X2,pch=20);In this plot, there is no systematic tendencies, so it is not interaction between X1 and X2. > qqnorm(resid); This plot indicates that the error terms are fairly normally distributed. c. H0: β1 = β2 =0 Ha: β1 ≠ 0 or β2 ≠ 0 The test statistics is : F* = MSR/MSE If F*>F(1-α,p-1,n-p), we conclude Ha, otherwise, we conclude H0. > #constructure the ANOVA table> J<-matrix(1/n,n,n);I=diag(n); > sse<-t(Y)%*%(I-H)%*%Y;sse; [,1] [1,] 94.3 > ssto<-t(Y)%*%(I-J)%*%Y; ssto; [,1] [1,] 1967 > ssr<-t(Y)%*%(H-J)%*%Y; ssr; [,1] [1,] 1872.7 > mse<- sse/(n-p); > msr<-ssr/(p-1); > F<-msr/mse > qf(0.99,2,13) [1] 6.700965 > F [,1] [1,] 129.0832 F* =129.0832 > 6.700965= F(0.99,2,13), So we reject H0 and conclude that at least one of β1 and β2 is not zero. d. > 1-pf(F,2,13) [,1] [1,] 2.658261e-09 So the P-value is 2.658261e-09 e. > R2<-ssr/ssto;R2; [,1] [1,] 0.952059 So the R2 is 0.952059 which means the most of the variance comes from regression, our model fits well. Problem 2: a. > Xh<-matrix(c(1,5,4),3,1); > Yh<-t(b)%*%Xh; > s2Yh<-t(Xh)%*%solve(Xs)%*%Xh*mse; > sYh<-sqrt(s2Yh); > YhCI<-c(Yh-qt(0.995,n-p)*sYh,Yh+qt(0.995,n-p)*sYh);YhCI; [1] 73.88111 80.66889 The CI is [73.88111, 80.66889] with confidence level 99%b. > s2Ypred<-mse+s2Yh; > sYpred<-sqrt(s2Ypred); > YpredCI<-c(Yh-qt(0.995,n-p)*sYpred,Yh+qt(0.995,n-p)*sYpred);YpredCI; [1] 68.48077 86.06923 So the 99 % prediction interval is [68.48077, 86.06923]. c. > Xnew<-X[,1:2] > Xsnew<-t(Xnew)%*%Xnew; > Hnew<-Xnew%*%solve(Xsnew)%*%t(Xnew); > bnew<-solve(Xsnew)%*%t(Xnew)%*%Y;bnew; [,1] 50.775 X1 4.425 > sseX1<-t(Y)%*%(I-Hnew)%*%Y;sseX1; [,1] [1,] 400.55 > sstoX1<-t(Y)%*%(I-J)%*%Y;sstoX1; [,1] [1,] 1967 > ssrX1<-t(Y)%*%(Hnew-J)%*%Y; ssrX1; [,1] [1,] 1566.45 SSR(X2|X1) = sseX1-sse = 400.55-94.3 = 306.25 The ANOVA Table is as follow: Source SS Df MS Regression 1872.7 2 936.35 X1 1566.45 1 1566.45 X2|X1 306.25 1 306.25 Error 94.3 13 7.253846 Total 1967 15 d. H0: β2 =0 Ha: β2 ≠ 0 The test statistics is : F* = MSR(X2|X1)/MSE If F*>F(1-α, 1,n-p), we conclude Ha, otherwise, we conclude H0. In this problem, F* =306.25/7.253846=42.21898 > 9.073806= F(0.99,1,13), So we reject H0 and conclude that β2 is not zero. The P-value is > 1-pf(42.21898,1,13) [1] 2.011048e-05Problem 3: a. > setwd("C:/spring 2009 TA/hw3"); > file<-"CH06PR09.txt"; > data<-read.table(file,header=FALSE, col.names=c("Y","X1","X2","X3")); > attach(data); > X<-as.matrix(cbind(1,X1,X2,X3)); > Y<-as.matrix(Y); > n<-length(Y);p<-dim(X)[2]; > Xs<-t(X)%*%X; > H<-X%*%solve(Xs)%*%t(X); > b<-solve(Xs)%*%t(X)%*%Y;b; [,1] 4.149887e+03 X1 7.870804e-04 X2 -1.316602e+01 X3 6.235545e+02 So the fitted regression model is Y=4149.887+0.000787X1-13.166X2+623.555X3. . The b1, b2, b3 here, separately, represents the effect of each predict variable on the dependent variable by changing 1 unit. b. > Yhat<-X%*%b; > resid<-Y-Yhat; > X1X2<-X1*X2 > > par(mar<-c(4,4,1,1)); NULL > plot(resid~Yhat,pch=20); > plot(resid~X1,pch=20);> plot(resid~X2,pch=20); > plot(resid~X3,pch=20); > plot(resid~X1X2,pch=20); >qqnorm(resid); From these plots, we can see the residual’s variation becomes smaller when the fitted value goes larger, which is similar to the situations of X1 and X3. Residuals only show variation constancy with respect to X2. And from the QQ plot, residuals are fairly normally distributed. c. >plot(resid,type='l');No systematic trend is observed. Error terms appear to be uncorrelated. Problem 4: a. > #constructs the ANOVA table > J<-matrix(1/n,n,n);I=diag(n); > sse<-t(Y)%*%(I-H)%*%Y;sse; [,1] [1,] 985530 > ssto<-t(Y)%*%(I-J)%*%Y;ssto; [,1] [1,] 3162136 > ssr<-t(Y)%*%(H-J)%*%Y;ssr; [,1] [1,] 2176606 > mse<- sse/(n-p); > msr<-ssr/(p-1); > mse<- sse/(n-p);mse; [,1] [1,] 20531.87 > msr<-ssr/(p-1);msr; [,1] [1,] 725535 > Xnew<-X[,1:2] > Xsnew<-t(Xnew)%*%Xnew; > Hnew<-Xnew%*%solve(Xsnew)%*%t(Xnew); > sseX1<-t(Y)%*%(I-Hnew)%*%Y;sseX1; [,1] [1,] 3025770 > ssrX1<-t(Y)%*%(Hnew-J)%*%Y; ssrX1;[,1] [1,] 136366 > X13<-cbind(Xnew,X3) > Xs13<-t(X13)%*%X13; > H13<-X13%*%solve(Xs13)%*%t(X13); > sseX13<-t(Y)%*%(I-H13)%*%Y;sseX13; [,1] [1,] 992204 > sseX13 - sse [,1] [1,] 6674 > sseX1-sseX13 [,1] [1,] 2033566 SSR(X2|X1, X3) = sseX13 - sse = 992204 - 985530= 6674 SSR(X3|X1) = sseX1-sseX13 = 3025770 - 992204= 2033566 Source SS DF MS Regression 2176606 3 725535.4 X1 136366 1 136366 X3|X1 2033566 1 2033566 X2|X1, X3 6674 1 6674 Error 985530 48 20531.87 Total 3162136 51 62002 b. H0: β2 =0 Ha: β2 ≠ 0 The test statistics is : F* = MSR(X2|X1)/MSE If F*>F(1-α, 1,n-p), we conclude Ha, otherwise, we conclude H0. In this problem, F* =6674/20531.87=0.3250 < 4.042652= F(0.95,1,48), So we do not reject H0 that β2 = 0. The P-value is > 1-pf(0.3250,1,48) [1] 0.5712771 c. Yes. > X12<-X[,1:3] > Xs12<-t(X12)%*%X12; > H12<-X12%*%solve(Xs12)%*%t(X12); > sseX12<-t(Y)%*%(I-H12)%*%Y;sseX12;[,1] [1,] 3020044 > Xnew2<-cbind(1,X2) > Xs2<-t(Xnew2)%*%Xnew2; > H2<-Xnew2%*%solve(Xs2)%*%t(Xnew2); > sseX2<-t(Y)%*%(I-H2)%*%Y;sseX2; [,1] [1,] 3150741 > ssrX2<-t(Y)%*%(H2-J)%*%Y; ssrX2; [,1] [1,] 11395 So here, SSR(X2|X1) = SSE (X1) – SSE (X1, X2) = 3025770 – 3020044 = 5726
View Full Document