The Geometry of Least Squares & Multiple Linear ModelWe have already seen that the least squares fit of a simple linear model can be viewed as theprojection of the n-dimensional vector y0= (y1, . . . , yn) onto the space spanned by the vector10= (1, . . . , 1) and x0= (x1, . . . , xn).Now, we consider the case where we have three variables on each subject, say (ui, vi, yi), i = 1, . . . , n,and we wish to find the best linear fit a + bu + cv to y:nXi=1(yi− a − bui− cvi)2If we consider our datat as three n-dimensional vectors y, u, and v, then the minimization of theabove quantity is equivalent to the following minimization:|| ||2So we are projecting y into the space spanned by 1, u, and v.1What does this projection look like? What are the coefficients for u and v? To answer thesequestions, fill in the following derivation:1. NOTATION:P1y = ¯y1= by(1)1P1,xy = P1y + Px·1y= by(1)1 + by(x·1)(x − ¯x1)That is, P stands for projection. The subscript tells us what space we are projecting on to.The subscript x · 1 stands for that part of x which is orthogonal to 1. The b represent thecoefficients. For example, by(1)is the cefficient for 1 that we get when we project y on to 1,and by(x·1is the coeffcient for (x − ¯x1).2. Reexpress ˆa andˆb from the simple linear regression in terms of these new coefficients.3. Next consider the problem of projecting y on to the space spanned by 1, u, and v. This spaceis equivalent to the space spanned by the orthogonal vectors:1, (u − ¯u1), andor equivalently the space spanned by1, (v − ¯v1), and4. In terms of our projection notation,P1,u,vy = P1y + += P1y + +25. Convert the above two equalities into the b notation to see what the coefficients would be:ˆy = ¯y1 + by(u·1)(u − ¯u1) + by(v·1,u)( )= ¯y1 + by(v·1)( )+ ( )6. Since the coefficents for u must be the same in these two equations (and likewise for v), thenwe can choose the expression from the simpler equation. This yields:ˆy = constant1 + by(u·1,v)u +We have found that the coefficient for u is that from the projection of y onto the part o f u whichis orthogonal to v and 1. This means that when we consider the size of the coefficient and whetherit is significantly different from 0, we must be keep in mind that it is the size when v and 1 arealready in the equation and already being used to expalin the variability in y.Consider the concrete example, where y is baby’s birthweight, u is mother’s height, and v is mother’sweight. We find,ˆy = 35 + 1.2 u + 0.07 vˆy = 27 + 1.4 uFor every increase in height of 1 inch, the average weight for the baby increases by 1.4 ounces. But,if we also know the mother’s weight, then for those mother’s of roughly the same weight, for everyincrease in height of 1 inch, the average weight for the baby increases by 1.2 ounces.Why are these two coefficients different?3What about the residual sum of squares?Fit Residual s um of squares/nbirth weight on constant 337 oz2birth weight on constant, height 324 oz2= 337 - 13birth weight on constant, weight 329 oz2= 337 - 8birth weight on constant, weight,height 322 oz2= 337 - 15Notice that the RSS for the two variable model is not 337 − 13 − 8 – Why is this the case?Fill in the table below for the two variable fit using smoking and height to explain birth weight.Fit Residual sum of squares/nbirth weight on constant 337 oz2birth weight on constant, height 324 oz2= 337 - 13birth weight on constant, smokingbirth weight on constant, height, smokingThis time we do have that the RSS for the two variable model is 337 − 13 − 20. Why is that thecase?What does it mean to fit birth weight to smoking
View Full Document