**Unformatted text preview:**

CosinesThe data this time will be the Motorcycle Acceleration Data:A data frame giving a series of measurements of head accelerationin a simulated motorcycle accident, used to test crash helmets.Usage:data(mcycle)Format:‘times’ in milliseconds after impact‘accel’ in gIt is in the MASS package, which is used in the book Modern Applied Statistics with S,by Venables and Ripley. It is a very good book for learning S and R, and modern appliedstatistics.10 20 30 40 50−100 −50 0 50timesaccel1We first look at sines and cosines to fit the data. We will cheat a little, by pretendingthat the X-variable is equally spaced, even though it is not quite so. That is, instead of the“times” as the x,weusex=(1, 2, 3,...,N), where here N = 133. We could then use sinewaves, with frequencies from 1 to 66. Each sine wave uses two functions: cos(freq× θ)andsin(freq×θ), where freq is the frequency, and θ =(2π/N, 2×2π/N, 3×2π/n,...,N×2π/N),the last one being 2π. The models we consider can take an y of the frequencies it wants (itdoesn’t have to be hierarchical, like polynomials), but they should be taken in (sin, cos)pairs. Then with frequencies up to F , the big X looks like1 cos(1N2π) sin(1N2π) cos(1N2 × 2π) sin(1N2 × 2π) ··· cos(1NF × 2π) sin(1NF × 2π)1 cos(2N2π) sin(2N2π) cos(2N2 × 2π) sin(2N2 × 2π) ··· cos(2NF × 2π) sin(2NF × 2π)........................1 cos(NN2π) sin(NN2π) cos(NN2 × 2π) sin(NN2 × 2π) ··· cos(NNF × 2π) sin(NNF × 2π).We will use frequencies up to F = 50, so that there are 32 degrees of freedom left over toestimate σ2e. If you suspect there are very high frequencies in the data, you wouldn’t wantto do this. To create the X:theta <- (1:133)*2*pi/133xx <- rep(1,133)for(i in 1:50) xx<-cbind(xx,cos(i*theta),sin(i*theta))The X has orthogonal columns, so XX is diagonal with the squared lengths of the columnsof X on the diagonals. So theβ’s and sum of squares for each columns are easy to get:dx <- apply(xx^2,2,sum)xy <- t(xx)%*%mcycle[,2]bhat <- xy/dxss <- xy^2/dxrss <- sum(mcycle[,2]^2)-sum(ss)sigma2 <- rss/32sigma2[1] 456.0139The sigma2 is the estimate of σ2efrom this big model. To assess the power of eachfrequency, we combine the sums of squares for the cosine and sine. The first ss is for the 1vector, ss[2], ss[3] are for frequency 1, ss[4], ss[5] are for frequency 2, etc.ssf <- ss[2*(1:50)]+ss[2*(1:50)+1]plot(1:50,ssf,xlab="Frequency",ylab="Power")20 10203040500 50000 100000 150000FrequencyPowerThe plot suggests taking frequencies 1, 2, and 3, at least. We can actually do F testseasily,f <- (ssf/2)/sigma2pvalue <- 1-pf(f,2,32)plot(1:50,pvalue,xlab="Frequency",ylab="pvalue",ylim=c(0,.2))30 10203040500.00 0.05 0.10 0.15 0.20FrequencypvalueThe models to consider put the most significant frequencies in first:ii <- order(pvalue)ii[1:20][1] 1 2 3433450 828 73533482923471844402432So we can try the Cp’s for the models that include frequencies 1, (1, 2), (1, 2, 3), (1, 2, 3, 43),(1, 2, 3, 43, 34), etc.tss <- var(mcycle[,2])*132regss <- cumsum(ssf[ii])rss <- tss-regsscp <- rss/sigma2-133+2*(2*(1:50)+1)40 10203040500 50 100 150Indexcpround(cbind(ii[1:20],cp[1:20]),2)Freq Cp[1,] 1 171.57[2,] 2 27.91[3,] 3 15.50[4,] 43 10.38[5,] 34 8.24[6,] 50 6.39[7,] 8 4.96[8,] 28 3.66[9,] 7 2.62[10,] 35 1.65[11,] 33 0.94[12,] 48 0.67[13,] 29 0.52 ***[14,] 23 0.99[15,] 47 1.88[16,] 18 3.22[17,] 44 4.825[18,] 40 6.51[19,] 24 8.20[20,] 32 10.17It looks like the Cpsays to take 13 frequencies, which are ii[1:13]. Wehavetofigureout which columns of X they represent, in order to get the fit:i <- c(1,2*ii[1:13],2*ii[1:13]+1)yhat <- xx[,i]%*%bhat[i]plot(1:133,mcycle[,2],xlab="Index",ylab="Accel")lines(1:133,yhat)0 20406080100120−100 −50 0 50IndexAccelThat’s quite wiggly. Taking just the top three frequencies:i <- c(1,2*ii[1:3],2*ii[1:3]+1)yhat <- xx[,i]%*%bhat[i]plot(1:133,mcycle[,2],xlab="Index",ylab="Accel")lines(1:133,yhat)60 20406080100120−100 −50 0

View Full Document