Stanford STATS 191 - Multiple Linear Regression - D1628603

Home> Schools> Stanford University> Statistics (STATS) > STATS 191> Multiple Linear Regression

Stanford STATS 191 - Multiple Linear Regression

Course Stats 191- Introduction to Applied Statistics

Pages 24

Download Save

Unformatted text preview:

Lecture 4: Multiple Linear RegressionNancy R. ZhangStatistics 191, Stanford UniversityJanuary 23, 2008Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 1 / 24How does land use affect river pollution?Nitrogen = β0+ β1Agr + β2Forest + β3Rsdntial + β4ComIndl + errorNancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 2 / 24Multiple Linear RegressionDesign matrix:X =X11X21··· Xp1X12X22Xp2... . . .......X1nX2nXpny =y1y2...ynSquared error loss function:L(β) =nXi=1yi−pXj=1βjXij2.In matrix notation:L(β) = (y − X β)0(y − X β).Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 3 / 24Linear Subspaces and ProjectionsWith p predictors we have p vectors in <n:Xi=Xi1Xi2...Xin, i = 1, . . . , p.We denote by L(X1, . . . , Xp) the linear space spanned by the vectorsX1, . . . , Xp:L(X1, . . . , Xp) =(pXi=1aiXi: (a1, . . . , ap) ∈ <p).This is a linear subspace of <n. We use the shorthand L(X ).Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 4 / 24The dimension of L(X1, . . . , Xp) is equivalent to the rank of thematrixX =X11X21··· Xp1X12X22Xp2... . . .......X1nX2nXpnThe rank of a matrix is equal to the number of linearlyindependent rows or columns.The linear map that projects any vector v ∈ <nonto L(X ) can beobtained byPX= X (X0X )−1X.Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 5 / 24Projection MatricesThus, for any n × p matr ix X , we can construct a projection matrixPX= X (X0X )1X that projects vectors onto the column space of X .Projection matrices enjoy some special properties:1P2X= PX.2rank(PX) = rank(X ).3For any v ∈ L(X ), PXv = v.For any linear space LX, it’s null space is the setL(X⊥) = {v ∈ <n: Xv = 0}.The projection matrix onto L(X⊥) is I − PX.Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 6 / 24Linear Regression by Least Squares = ProjectionSimple linear regression with intercept:Projecty1y2...ynonto11...1,x1x2...xn.Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 7 / 24Multiple Linear RegressionThe solution toˆβ = arg minβ∈<p(y − X β)0(y − X β)can be obtained fromˆy = arg minv∈L(X)(y − v)0(y − v).The above is equivalent to finding the projection of Y in L(X):ˆy = PXy = X (X0X )−1Xy,thusˆβ = (X0X )−1X0y.The residuals are the projection of Y onto L(X⊥):r = y −ˆy = (I − PX)y = PX⊥y.Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 8 / 24Multiple Linear RegressionThe solutionˆβ = (X0X )−1X0Ycan also be obtained from setting the derivative of the least squarederror loss to 0:L(β) = (y − X β)0(y − X β)L0(ˆβ) = X0(y − Xˆβ) = 0⇒ X0Xˆβ = X0y⇒ˆβ = (X0X )−1X0y.Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 9 / 24Calculating variancesMultivariate GaussiansLet Z ∼ N(µ, Σ), and a ∈ <n, B an n ×n matrix, thena + BZ ∼ N(a + Bµ, BΣB0).ˆβ = (X0X )−1X0y, y ∼ N(X β, σ2I)⇒ E(ˆβ) = βVar(ˆβ) = σ2(X0X )−1Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 10 / 24Calculating varianceMultivariate GaussiansLet Z N(µ, Σ), and a ∈ <n, B an n ×n matrix, thena + BZ ∼ N(a + Bµ, BΣB0).ˆy = PXy⇒ E(ˆy) = XβVar(ˆy) = σ2PXThe diagonal of PXare the leverage values from last lecture.Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 11 / 24t-tests forˆβiˆβ ∼ N(β, σ2(X0X )−1).As before, estimate σ2usingˆσ2= SSE/(n − p)Then, we can construct t-test by:tˆβi=ˆβis.e.(ˆβi).As before, reject the hypothesis Hi,0:ˆβi= 0 at level α iftˆβi> t(n − p, α/2).Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 12 / 24Interpreting theˆβi’sY = β0+ β1X1+ β2X2+ ··· + βpXpTheˆβiobtained from a multiple regression is sometimes called partialregression coefficients because they correspond to a simpleregression of Y on Xi, after taking out the effects of Xj: j 6= i.1Regress Xion {Xj: j 6= i}, get residuals ei= Xi−ˆXi.2Regress Yion {Xj: j 6= i}, get residuals eY= Y −ˆY∼i.3Do simple linear regression of eYon ei, the slope will give youˆβi.This gives us:Var(ˆβi) = σ2/keik2.High correlation among the X ’s can “mask" out each other’s effects.Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 13 / 24Goodness of fitSums of squaresSSE =nXi=1(Yi−bYi)2=nXi=1(Yi−bβ0−bβ1Xi)2SSR =nXi=1(Y −bYi)2=nXi=1(Y −bβ0−bβ1Xi)2SST =nXi=1(Yi− Y )2= SSE + SSRR2=SSRSST= 1 −SSESST.R =√R2is called the multiple correlation coefficient.R2is large: a lot of the variability in YYY is explained by XXX .Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 14 / 24Large R2may not indicate a good model.Hypothetical scenario:n observations, n linearly independent covariates.What would you get for R2?As you add predictors to the model, R2will always increase, no matterwhat those predictors are!Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 15 / 24“Goodness of Fit” MeasuresR provides the following measures:R2=SSRSST= 1 −SSESST.R2a= 1 −SSE/(n − p − 1)SST /(n − 1)= 1 −n − 1n − p − 1(1 − R2)1R2is easy to interpret. It is the proportion of the “variation” in thedata explained by the model.2R2does not adjust for the model size, while R2adoes. Whencomparing models of different sizes, use R2a.3However, for hypothesis testing the F statistic should be used.Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 16 / 24F-tests for R2Assume model has intercept (design matrix has p columns).F =SSR/(p + 1)SSE/(n − p − 1)F-distributionIf W ∼ χ2qis independent of Z ∼ χ2r, thenW /qZ /r∼ Fq,r.Why are SSR and SSE independent?Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 17 / 24F-TableSource Sum of Squares d.f. Mean Square FRegression SSR p + 1 MSR =SSRp+1F =MSRMSEResiduals SSE n − p − 1 MSE =SSEn−p−1Reject at level α if F > F (p + 1, n − p − 1, α).This tests the hypothesis H0: β1= β2= ··· = βp= 0.Nancy R. Zhang (Statistics 191) Lecture 4 January 23, 2008 18 / 24Nested modelsTest the hypothesis that a subset of βi’s are zero:H0: β1= β2= ··· = βr= 0.That is, we have the modelRM : Y = βr+1Xr+1+ ··· + βpXp+ errornested withinFM : Y = β1X1+ ··· + βpXp+ error.Does X1, . . . , Xrhave a significant marginal effect, after adjusting forthe

View Full Document


School:
Email:
New Password:
Confirm Password:

Stanford STATS 191 - Multiple Linear Regression

Sign up for free to view:

Please select your school