DOC PREVIEW
ISU STAT 511 - Homework # 4

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Stat 511 HW#4 Spring 20031. Consider the (non-full-rank) "effects model" for the 22×factorial (with 2 observations percell) called "Example d)" in the first lecture. For each of the hypotheses below, identify aCand a d%so that it may be written in the form 0H:Cdβ=%%. Determine which of thehypotheses below is testable. For each testable hypothesis that can written in the form0H:0Cβ=%%, find a matrix 0X so that the hypothesis can be written in the form00H:E for some YXγγ=%% (that is, so that it can be written as ()00H:EYCX∈ where()0()CXCX⊂ ).a) ()012112221H:7αβαβαβαβ−−−=b) ()012112221H:0αβαβαβαβ−−−=c) 0H:0 ,ijijαβ =∀d) 01H:7µα+=e) 012H:0 , and 7ijijαβαα=∀−=f) 012H:0 , and 0ijijαβαα=∀−=Note: This is the problem as I assigned it. But, it is poorly worded and hopelessly ambiguous."Testability" as defined in class can only be determined for a particular expression of anhypothesis in matrix form, which I didn't give you here.2. (Adapted from Koehler's Spring 2002 HW 3) On the Web pagehttp://www.public.iastate.edu/~vardeman/stat511/511data.html you will find the filebiomass.txt We're going to do some statistical analysis on this R data frame. These datacome from a study of the effects of soil characteristics on aerial biomass production of the marshgrass Spartina alterniflora (Rick A. Lindhurst, 1979, Aeration, nitrogen, pH, and salinity asfactors affecting Spartina alterniflora growth and dieback, Ph.D. dissertation, North CarolinaState University). There are eight entries on each line of the data frame. These are (in the orderbelow)212345 (revegetated area, short grass, tall grass)aerial biomass (g/m)soil salinity (%)soil acidity as measured in water (pH)soil potassium (ppm)soil sodium (ppm)soil zinc (ppmLocationTypeyxxxxx====== )The first row of the file has the variable names in it. (You might open this file in Notepad andhave a look at it.)Enter these data into R using the command> biomass<-read.table("filename",header=T)2I was able to get this loaded by placing biomass.txt into the directory "rw1061" created inthe installation of my copy of R, and using biomass.txt (no quote marks) in place of"filename" above. I was also able to get it loaded by using"http://www.public.iastate.edu/~vardeman/stat511/biomass.txt"(complete with quote marks) in place of "filename" above while connected to the network.Use the command> biomassto view the data frame. It should have eight columns and 45 rows. Now create two matrices thatwill be used to fit a regression model to these data. The third column will be used as theresponse vector and the last five columns will be used to make most of the model matrix. Type> Y<-as.matrix(biomass[,3])> X<-as.matrix(biomass[,4:8])Note the use of []to select columns from the data frame. Here, the function as.matrix isused to create a matrix from one or more columns of the data frame. To add a column of ones tothe model matrix, type> X0<-rep(1,length(Y))> X<-cbind(X0,X)Make a scatterplot matrix for 12345,,,, and yxxxxx. To do this, first load the "lattice" package.(Look under the "Packages" heading on the R GUI, select "Load package" and then "lattice".)Then type> splom(~biomass[,3:8],aspect="fill")If you had to guess based on this plot, which single predictor do you think is probably the bestpredictor of biomass? Do you see any evidence of multicollinearity (correlation among thepredictors) in this graphic?To redo the scatterplot matrix after passing smooth curves through each of the scatterplots, youmay do this. Define a function> points.lines<-function(x,y)+ {+ points(x,y)+ lines(loess.smooth(x,y,0.90))+ }set some parameters for the graphic> par(pch=18,cex=1.2,lwd=3)and then issue the command> pairs(biomass[,-(1:2)],panel=points.lines)3Also compute a sample correlation matrix for 12345,,,, and yxxxxx. You may compute thematrix using the cor() function and round the printed values to four places using the round()function as> round(cor(biomass[-(1:2)]),4)Use the qr() function to find the rank of X.Use R matrix operations on the X matrix and Y vector to find the estimated regression coefficientvector OLSb%, the estimated mean vector ˆY, and the vector of residuals ˆeYY=−%.Plot the residuals against the fitted means. This can be done using the following code.> b<-solve(t(X)%*%X)%*%t(X)%*%Y> yhat<-X%*%b> e<-Y-yhat> par(fin=c(6.0,6.0),pch=18,cex=1.5,mar=c(5,5,4,2))> plot(yhat,e,xlab="Predicted Y",ylab="Residual",main="Residual Plot")Type > help(par) to see the list of parameters that may be set on a graphic. What does thefirst specification above do, i.e. what does fin=c(6.0,6.0) do?Plot the residuals against salinity. You may use the following code.> plot(biomass$salinity,e,xlab="Salinity",ylab="Residual",main="Residual Plot")And you can add a smooth trend line to the plot by typing> lines(loess.smooth(biomass$salinity,e,0.90))What happens when you type> lines(loess.smooth(biomass$salinity,e,0.50))(The values 0.90 and 0.50 are values of a "smoothing parameter." You could have discoveredthis (and more) about the loess.smooth function by typing > help(loess.smooth))Now plot the residuals against each of 2345,, and xxxx.Create a normal plot from the values in the residual vector. You can do so by typing> qqnorm(e,main="Normal Probability Plot")> qqline(e)Now compute the sum of squared residuals and the corresponding estimate of 2σ, namely4¶( ) ( )( )2ˆˆrankYYYYnXσ′−−=−Use this and compute an estimate of the covariance matrix for OLSb%, namely¶( )12XXσ−′Sometimes you may want to write a matrix out to a file. This can be done as follows. Firstprepare the row and columns labels and round all entries to 4 places using the code> case<-1:45> heading<- c("Case","Salinity","pH","K","Na","Zn","Biomass","Predicted","Residual")> temp<-cbind(case,X[,-1],Y,yhat,e)> dimnames(temp)<-list(case,heading)> round(temp,4)Then load the "MASS" package (in order to make the write.matrix function available).The code> write.matrix(temp,file="c:/temp/regoutput.out")will then write output to the file c:/temp/regoutput.out (you may choose another nameand destination for this file).Modify the above to create a matrix that has OLSb%in the first column and a vector ofcorresponding standard errors (square roots of diagonal entries of the estimated covariancematrix for OLSb%) in the second. Label the rows and columns of your matrix and write it out to afile. Submit a listing


View Full Document

ISU STAT 511 - Homework # 4

Download Homework # 4
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Homework # 4 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Homework # 4 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?