DOC PREVIEW
UIUC STAT 420 - hw2_sol

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

AMS 578 Homework 2 solution Problem 1: a. > > setwd("C:/spring 2009 TA/hw1"); > file<-"CH01PR19.txt"; > data<-read.table(file,header=FALSE, col.names=c("Y","X")); > attach(data); > boxplot(X); We can see the box plot is almost symmetric and no outliner exists. b. > muy<-mean(Y); > mux<-mean(X); > b1<-sum((X-mux)*(Y-muy))/sum((X-mux)^2); > b0<-muy-b1*mux; > resid<-Y-(b0+b1*X); > dotchart(resid);Most residuals are within a vertical band (-1,1) centered around 0,displaying no systematic tendencies to be positive or negative. That means a linear regression model is appropriate. c. > yhead<-b0+b1*X; > plot(yhead, resid,type="p");So the residual is no obviously systematic trend to be positive or negative, which means the error variance is constant and the linear regression function fits well. d. > temp<-qqnorm(resid); > exp<-temp$x; > corr_coef<-sum((resid-mean(resid))*(exp-mean(exp)))/sqrt(sum((resid-mean(resid))^2)*sum((exp-mean(exp))^2)) > > corr_coef [1] 0.9744497 Here we plot the normal QQ plot, and the correlation coefficient is 0.9744497. From Table B.6 we know that when n =100 critical value is 0.987, and the critical value increased with n. So when n =120, corr_coef = 0.97 < critical value. So we conclude that error terms do depart substantially from a normal distribution. e. For Brown-Forsythe Test, The test statistics is: So ,If|t*|≤ t1-α/2,n-2 we conclude the error variance is constant. If |t*|> t1-α/2,n-2 we conclude the error variance is not constant. > d1<-abs(resid[data[,2]<26]-median(resid[data[,2]<26])); > d2<-abs(resid[data[,2]>=26]-median(resid[data[,2]>=26])); > n1<-length(d1); > n2<-length(d2); > sd_BF<-sqrt((sum((d1-mean(d1))^2)+sum((d2-mean(d2))^2))/118); > tBF<-(mean(d1)-mean(d2))/(sd_BF*sqrt(1/n1+1/n2)); > tBF [1] -0.8967448 So we can see the tBF= -0.8967448, that means |tBF|=0.9128428<=2.618137= t.995,118. So we conclude that the the variance is constant. f. > file<-"CH03PR03.txt"; > data<-read.table(file,header=FALSE, col.names=c("Y","X1","X2","X3")); > attach(data); > plot(X2,resid); > plot(X3,resid);We can see the residual has systematic positive trend with X2, but no trend with X3. So we should add X2 in the regression model. Problem 2: a. > b1<-sum(X*Y)/sum(X^2); > b1; [1] 0.1216429 So the fitted model is : Y= 0.1216429*X b. > e<-Y-b1*X; > s<-sqrt(sum(e^2)/(119*sum(X^2))); > CI<-c(b1-qt(0.975,119)*s,b1+qt(0.975,119)*s); > CI [1] 0.1164216 0.1268643 So the 95 percent confidence interval is [0.1164216, 0.1268643], which means which doesn’t include zero. That means there is the significant relationship existed between the ACT score and the GPA at the end of the freshman year, and the true β1 will be somewhere between 0.1164211 and 0.1268647with 95% possibility. c. > syh<-30*s; > CIyh<-c(b1*30-qt(0.975,119)*syh,b1*30+qt(0.975,119)*syh); > CIyh [1] 3.492647 3.805928Problem 3: a. > plot(Y~X, pch=20); > abline(0,b1,col=2); > legend(20,1, legend=c("Y= 0.1216429*X"), col=2,lty=1,text.col=2,bty="n"); > This regression function fits the data well. b. > sum(e); [1] 7.9715 > ye<-b1*X; > plot(e~ye,pch=20);We can see the sum of residual is not equal to zero. And we can see the residual has a negative trend against the fitted value, the error variance may be not constant. c. H0: E{Y} = β1*X Ha: E{Y} ≠ β1*X The test statistics is : F* = [(SEE(R) - SSE(F)) / (c - 1)] ÷ [SSE(F) / (n-c)] If F*>F(1-α,c-1,n-c), we conclude Ha, otherwise, we conclude H0. > sse<-sum((Y-ye)^2); > minx<-min(X); > maxx<-max(X); > ssef<-0; > c<-0; > for(i in minx:maxx){ + yi<-data[data[,2]==i,1]; + if(length(yi)>0){c=c+1}; + ssef<-ssef+sum((yi-mean(yi))^2); + } > > F<-((sse-ssef)/(120-1-(120-c)))/(ssef/(120-c)) > F [1] 2.937110 Since F* =2.937110> 2.229389= F(0.995, 20,99), we reject H0 and conclude Ha. > pvalue<-1-pf(F,20,99); > pvalue; [1] 0.0002159044 And the p-value is 0.0002640621, which is consistent with our conclusion. Problem 4: a. > > setwd("C:/spring 2009 TA/hw1"); > file<-"CH01PR20.txt"; > data<-read.table(file,header=FALSE, col.names=c("Y","X")); > attach(data); > dotchart(X); >We can see no outliner here. b. > plot(c(1:45),X,type='l');We can see there is no obvious time trend in this graph. c. > muy<-mean(Y); > mux<-mean(X); > b1<-sum((X-mux)*(Y-muy))/sum((X-mux)^2); > b0<-muy-b1*mux; > resid<-Y-(b0+b1*X); > stem(resid); The decimal point is 1 digit(s) to the right of the | -2 | 30 -1 | -1 | 3110 -0 | 99997 -0 | 44333222111 0 | 001123334 0 | 5666779 1 | 112234 1 | 5 So we can see that most of residuals are stay within (-13,15), just two outliners. d. > yhead<-b0+b1*X; > plot(yhead,resid,pch=20);> plot(X, resid,pch=20); These graphs give the same information. For these plots, we can diagnosis whether the regression function is linearity and error variance is constant. e. > temp<-qqnorm(resid);> exp<-temp$x; > corr_coef<-sum((resid-mean(resid))*(exp-mean(exp)))/sqrt(sum((resid-mean(resid))^2)*sum((exp-mean(exp))^2)) > corr_coef [1] 0.9889098 So the correlation coefficient is 0.9889098, (n = 45, a =0.1) from B.6 we know that when n =45 critical value is between 0.977 and 0.981, corr_coef = 0.9889098 > critical value, so we conclude that error terms has a normal distribution. f. > plot(c(1:45),resid,type='l'); From this plot, we cannot say the residuals are correlated over time. g. H0: error variances are constant. Ha: error variances are not constant The test statistics is : X2BP= (SSR*/2)÷(SSE/n)2 If X2BP> X2 (1-α,1), we conclude Ha, otherwise, we conclude H0. > logresid<-resid^2; > muy<-mean(logresid); > mux<-mean(X); > b1_log<-sum((X-mux)*(logresid-muy))/sum((X-mux)^2); > b0_log<-muy-b1*mux; > ylog<-b0_log+b1_log*X; > SSR<-sum((ylog-mean(ylog))^2); > SSE<-sum(resid^2);> xbp<-(SSR/2)/(SSE/45)^2;xbp [1] 1.314680 So the X2BP = 1.314680<3.84= X2 (.95,1), we conclude H0. h. > file<-"CH03PR04.txt"; > data<-read.table(file,header=FALSE, col.names=c("Y","X1","X2","X3")); > attach(data); > plot(X2,resid); > > plot(X3,resid); >We can see the residual has systematic positive trend with X2, but no trend with X3. So we should add X2 in the regression model. Problem 5: a. 01 1(,) ()Cov XVar bββ=−, > mean(X) [1] 5.111111 01(,)0Covββ<, So They tend to error in opposite direction. b. > MSE<-sum(resid^2)/43; > sb1<-sqrt(MSE/sum((X-mux)^2)); > sb0<-sqrt(MSE*(1/45+mux^2/sum((X-mux)^2))); > CIb1<-c(b1-qt(0.9875,43)*sb1,b1+qt(0.9875,43)*sb1); > CIb1; [1] 13.91322


View Full Document

UIUC STAT 420 - hw2_sol

Documents in this Course
Load more
Download hw2_sol
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view hw2_sol and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view hw2_sol 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?