AMS 578 Homework 2 solution Problem 1: a. > > setwd("C:/spring 2009 TA/hw1"); > file<-"CH01PR19.txt"; > data<-read.table(file,header=FALSE, col.names=c("Y","X")); > attach(data); > boxplot(X); We can see the box plot is almost symmetric and no outliner exists. b. > muy<-mean(Y); > mux<-mean(X); > b1<-sum((X-mux)*(Y-muy))/sum((X-mux)^2); > b0<-muy-b1*mux; > resid<-Y-(b0+b1*X); > dotchart(resid);Most residuals are within a vertical band (-1,1) centered around 0,displaying no systematic tendencies to be positive or negative. That means a linear regression model is appropriate. c. > yhead<-b0+b1*X; > plot(yhead, resid,type="p");So the residual is no obviously systematic trend to be positive or negative, which means the error variance is constant and the linear regression function fits well. d. > temp<-qqnorm(resid); > exp<-temp$x; > corr_coef<-sum((resid-mean(resid))*(exp-mean(exp)))/sqrt(sum((resid-mean(resid))^2)*sum((exp-mean(exp))^2)) > > corr_coef [1] 0.9744497 Here we plot the normal QQ plot, and the correlation coefficient is 0.9744497. From Table B.6 we know that when n =100 critical value is 0.987, and the critical value increased with n. So when n =120, corr_coef = 0.97 < critical value. So we conclude that error terms do depart substantially from a normal distribution. e. For Brown-Forsythe Test, The test statistics is: So ,If|t*|≤ t1-α/2,n-2 we conclude the error variance is constant. If |t*|> t1-α/2,n-2 we conclude the error variance is not constant. > d1<-abs(resid[data[,2]<26]-median(resid[data[,2]<26])); > d2<-abs(resid[data[,2]>=26]-median(resid[data[,2]>=26])); > n1<-length(d1); > n2<-length(d2); > sd_BF<-sqrt((sum((d1-mean(d1))^2)+sum((d2-mean(d2))^2))/118); > tBF<-(mean(d1)-mean(d2))/(sd_BF*sqrt(1/n1+1/n2)); > tBF [1] -0.8967448 So we can see the tBF= -0.8967448, that means |tBF|=0.9128428<=2.618137= t.995,118. So we conclude that the the variance is constant. f. > file<-"CH03PR03.txt"; > data<-read.table(file,header=FALSE, col.names=c("Y","X1","X2","X3")); > attach(data); > plot(X2,resid); > plot(X3,resid);We can see the residual has systematic positive trend with X2, but no trend with X3. So we should add X2 in the regression model. Problem 2: a. > b1<-sum(X*Y)/sum(X^2); > b1; [1] 0.1216429 So the fitted model is : Y= 0.1216429*X b. > e<-Y-b1*X; > s<-sqrt(sum(e^2)/(119*sum(X^2))); > CI<-c(b1-qt(0.975,119)*s,b1+qt(0.975,119)*s); > CI [1] 0.1164216 0.1268643 So the 95 percent confidence interval is [0.1164216, 0.1268643], which means which doesn’t include zero. That means there is the significant relationship existed between the ACT score and the GPA at the end of the freshman year, and the true β1 will be somewhere between 0.1164211 and 0.1268647with 95% possibility. c. > syh<-30*s; > CIyh<-c(b1*30-qt(0.975,119)*syh,b1*30+qt(0.975,119)*syh); > CIyh [1] 3.492647 3.805928Problem 3: a. > plot(Y~X, pch=20); > abline(0,b1,col=2); > legend(20,1, legend=c("Y= 0.1216429*X"), col=2,lty=1,text.col=2,bty="n"); > This regression function fits the data well. b. > sum(e); [1] 7.9715 > ye<-b1*X; > plot(e~ye,pch=20);We can see the sum of residual is not equal to zero. And we can see the residual has a negative trend against the fitted value, the error variance may be not constant. c. H0: E{Y} = β1*X Ha: E{Y} ≠ β1*X The test statistics is : F* = [(SEE(R) - SSE(F)) / (c - 1)] ÷ [SSE(F) / (n-c)] If F*>F(1-α,c-1,n-c), we conclude Ha, otherwise, we conclude H0. > sse<-sum((Y-ye)^2); > minx<-min(X); > maxx<-max(X); > ssef<-0; > c<-0; > for(i in minx:maxx){ + yi<-data[data[,2]==i,1]; + if(length(yi)>0){c=c+1}; + ssef<-ssef+sum((yi-mean(yi))^2); + } > > F<-((sse-ssef)/(120-1-(120-c)))/(ssef/(120-c)) > F [1] 2.937110 Since F* =2.937110> 2.229389= F(0.995, 20,99), we reject H0 and conclude Ha. > pvalue<-1-pf(F,20,99); > pvalue; [1] 0.0002159044 And the p-value is 0.0002640621, which is consistent with our conclusion. Problem 4: a. > > setwd("C:/spring 2009 TA/hw1"); > file<-"CH01PR20.txt"; > data<-read.table(file,header=FALSE, col.names=c("Y","X")); > attach(data); > dotchart(X); >We can see no outliner here. b. > plot(c(1:45),X,type='l');We can see there is no obvious time trend in this graph. c. > muy<-mean(Y); > mux<-mean(X); > b1<-sum((X-mux)*(Y-muy))/sum((X-mux)^2); > b0<-muy-b1*mux; > resid<-Y-(b0+b1*X); > stem(resid); The decimal point is 1 digit(s) to the right of the | -2 | 30 -1 | -1 | 3110 -0 | 99997 -0 | 44333222111 0 | 001123334 0 | 5666779 1 | 112234 1 | 5 So we can see that most of residuals are stay within (-13,15), just two outliners. d. > yhead<-b0+b1*X; > plot(yhead,resid,pch=20);> plot(X, resid,pch=20); These graphs give the same information. For these plots, we can diagnosis whether the regression function is linearity and error variance is constant. e. > temp<-qqnorm(resid);> exp<-temp$x; > corr_coef<-sum((resid-mean(resid))*(exp-mean(exp)))/sqrt(sum((resid-mean(resid))^2)*sum((exp-mean(exp))^2)) > corr_coef [1] 0.9889098 So the correlation coefficient is 0.9889098, (n = 45, a =0.1) from B.6 we know that when n =45 critical value is between 0.977 and 0.981, corr_coef = 0.9889098 > critical value, so we conclude that error terms has a normal distribution. f. > plot(c(1:45),resid,type='l'); From this plot, we cannot say the residuals are correlated over time. g. H0: error variances are constant. Ha: error variances are not constant The test statistics is : X2BP= (SSR*/2)÷(SSE/n)2 If X2BP> X2 (1-α,1), we conclude Ha, otherwise, we conclude H0. > logresid<-resid^2; > muy<-mean(logresid); > mux<-mean(X); > b1_log<-sum((X-mux)*(logresid-muy))/sum((X-mux)^2); > b0_log<-muy-b1*mux; > ylog<-b0_log+b1_log*X; > SSR<-sum((ylog-mean(ylog))^2); > SSE<-sum(resid^2);> xbp<-(SSR/2)/(SSE/45)^2;xbp [1] 1.314680 So the X2BP = 1.314680<3.84= X2 (.95,1), we conclude H0. h. > file<-"CH03PR04.txt"; > data<-read.table(file,header=FALSE, col.names=c("Y","X1","X2","X3")); > attach(data); > plot(X2,resid); > > plot(X3,resid); >We can see the residual has systematic positive trend with X2, but no trend with X3. So we should add X2 in the regression model. Problem 5: a. 01 1(,) ()Cov XVar bββ=−, > mean(X) [1] 5.111111 01(,)0Covββ<, So They tend to error in opposite direction. b. > MSE<-sum(resid^2)/43; > sb1<-sqrt(MSE/sum((X-mux)^2)); > sb0<-sqrt(MSE*(1/45+mux^2/sum((X-mux)^2))); > CIb1<-c(b1-qt(0.9875,43)*sb1,b1+qt(0.9875,43)*sb1); > CIb1; [1] 13.91322
View Full Document