EXST7034 - Regression Techniques Page 1Prediction of a new observation : note that this is a single observation, not theregression line.First, the variance of a generic linear combination (from Chapter 1:1.27a & b) T aW bX cZœ E(T) aE(W) bE(X) cE(Z)œ Var(T) a Var(W) b Var(X) c Var(Z) + 2(Covariances)œ222 Var(T) a Var(W) b Var(X) c Var(Z)œ222 abCov(W,X) bcCov(X,Z) acCov(X,Z)If we are able to assume that the three terms are stochastically independent, thenthe covariances are equal to zero.We have already seen a series of Linear Combinations 1) First we saw, b = k Y"33DD(X X)Y_(X X)_i=1n3332œ! so, Var(b ) k Var(Y ) + k Var(Y ) + k Var(Y ) + ..." "#$"#$œ222 since all Y at all values of X have the same variance (homogeneous), then33 k Var(Y )ni=1œ D233 and recall that k ni=1D21(X X)–3œD32 and that Var(Y ) is estimated by the MSE, then3 Var(b ) "œMSE(X X)–D32EXST7034 - Regression Techniques Page 21) Show that b is a linear combination of k "3œ(X X)_(X X)_33D2 a) b = "DD(X X)(Y Y)_–(X X)_3332 where b) (X X)(Y Y) = (X Y XY X Y XY)__––––DD33 3333 = (X X)Y (X X)Y––_DD333 = (X X)Y Y (X X)––_DD33 3 and since (X X) 0_D3œ then c) (X X)(Y Y) = (X X)Y__–DD33 33 as a result, d) b = Y k Y"333 DDDDDD(X X)(Y Y) (X X)Y (X X)___–(X X) (X X) (X X)___i=1n33 33 3333222œœ œ! where e) k = D3!i=1n(X X) (X X)__(X X) (X X)__3333DDD22œ note that k 0 since (X X) 0_DD33œœ now prove that k ni=1D21(X X)–3œD32 f) k = = D2i=1 i=1nn(X X) (X X)__(X X)_2(X X)_3!!’“3333DD2222cd (XX) _œœ11(X X)_2(X X)_cdDD333222DEXST7034 - Regression Techniques Page 32) Then we say that Y = b b X , also a linear combination.^3! "3 Var(Y ) = 1*Var(b ) X *Var(b ) 2*1*X *Cov(b ,b )^3!3"3!" note that we do NOT assume that b and b are independent.!"The covariance is included, not equal to zero.Using previous definitions of Var(b ) and Var(b ), and the Gaussian multipliers!"from the (X X) matrix for the covariancew"- Var(Y )= *1 *X 2*1*X^33355 522 2 2 21X 1n–(X - X) (X - X) n (X - X)–– –X’“’“’“ 222 2DD DD33 33 Var(Y ) = ^3521Xn–(X - X) (X - X) (X - X)–––X2X X–’“22222DDD33333 Var(Y ) = + ^3521n(X X)_(X X)_’“3322D3) Now we want a confidence interval for a single (new) observation. The equation for that observation is Y = b bX 3! "3 3% or Y = Y ^33 3 % We assumed independence once before (each Y independent of others). We are3now going to assume independence again. We assume that the residualsare independent of the model (ie. assume that are independent of Y ).^%33So the variance of single observations will be Var(Y ) = Var(Y ) Var( ) 2*Cov(Y , ) 0^^33 3 33 œ%%’“EXST7034 - Regression Techniques Page 4We know from previous work that Var(Y ) = + ^3521n(X X)_(X X)_’“3322D Var( ) %53œ2 therefore Var(Y ) = + 355221n(X X)_(X X)_’“3322D or Var(Y ) = 1 + 3521n(X X)_(X X)_’“3322D where the estimator of is MSE52 Note that both your textbook and I have been using for both Var(Y ) and523for Var( ). Each is “the variance", but they are variances of different%3things. A better notation perhaps is Var( ) %53œ2% There is another confidence interval of potential interest between Var(Y ) = + , the regression line^3521n(X X)_(X X)_’“3322D and Var(Y ) = 1 + , a new observation3521n(X X)_(X X)_’“3322D This is the confidence interval for the mean of a new sample taken at someparticular value of X , where m is the size of the new sample. This cannot3be as narrow as the confidence interval for the regression, but should benarrower than the confidence interval for a single sample. This CI is givenby, Var(Y ) = + , X for a new sample–35211mn(X X)_(X X)_’“3322DEXST7034 - Regression Techniques Page 5Example : From examplevial breakage regressed on number of airline transfers Place a confidence interval on the breakage for 3 transfers for a single newobservation. s = MSE 1 + 2Y1n(X X)_(X X)_33322D = 2.2 1 + = 2.2 1 + = 2.2*1.5 = 3.311410 10 10(3 1)20210210we previously calculated the variance of the regression line at s 1.1. Note2Y^3œthat the variance of a single point is s s = 1.1 2.2 3.322Y^3œ s 3.3 1.816Y3œœÈ since t 2.306, then!#ß).0 œ P(Y t s E(Y) Y t s ) 1-^^^X=3 Y X=3 YŸŸ œ" ß8# " ß8#!!##33! P(22.2 2.306*1.816 E(Y) 22.2 2.306*1.816) 1-^ŸŸ œ! P(18.011 E(Y) 26.389) 0.95^ŸŸ œ SAS will calculate confidence intervals for either the regression line (optionCLM) or for individual points (option CLI). But not for a new sample.Check this against the SAS outputEXST7034 - Regression Techniques Page 6 Suppose were were to ship 4 cases through 3 transfers. What is the confidenceinterval for the mean breakage of 4 cases? s = MSE + = 2.2 + 2Ymn 410_11 11(X X) (3 1)_(X X)_2033322221010D = 2.2 + = 2.2*0.75 = 1.6511441010 s 1.65 1.2845Y_3œœÈ since t 2.306, then!#ß).0 œ P(Y t s E(Y) Y t s ) 1-___X=3 X=3YY__ŸŸ œ" ß8# " ß8#!!##33! P(22.2 2.306*1.2845 E(Y) 22.2 2.306*1.2845) 1-_ŸŸ œ! P(19.238 E(Y) 25.162) 0.95_ŸŸ œThe MEAN of the 4 cases falls in this range. The CI for the regression line is narrower The CI for individual points is
View Full Document