DOC PREVIEW
UF STA 6166 - Linear Regression 2

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Error n-k-1 SSE MSEError 18 278.8 15.5Multiple Linear Regression Model Multiple Linear Regression refers to regression applications in which there are more than one independent variables, x1, x2, … , xk . A multiple linear regression model with k independent variables has the equation 011...kkyxxβββ=+ ++ +εx (1) The ε is a random variable with mean 0 and variance σ2. A prediction equation for this model fitted to data is 011ˆˆ ˆˆ...kkyxββ=+ ++β (2) where denotes the “predicted” value computed from the equation, andˆyˆiβ denotes an estimate of βi. These estimates are usually obtained by the method of least squares. This means finding among the set of all possible values for the parameter estimates the ones which minimize the sum of squared residuals, . The least squares estimates yield the best fitting equation in terms of minimizing the sum of squared distances of the fitted plane to the data points. 21ˆ(niiyy=−∑) An example of a multiple linear regression with two independent variables is given by the KWH data, but now with x1=AC and x2=DRYER. Figure 1 shows a plot or KWH versus DRYER. 102030405060708090100KWH-0.5 0 .5 1 1.5 2 2.5 3 3.5DRYERFigure 1. Plot of KWH versus DRYER. The plot clearly shows KWH increases with increasing runs of the dryer. The model equation would be 01 2KWH AC DRYERβββ=+ + +ε. Least squares parameter estimates are 01 2ˆˆˆ8.11, 5.47, 13.22βββ== = Computation of the estimates by hand is tedious, and infeasible for more than two independent variables. Estimates are ordinarily obtained using a regression computer program. Standard errors also are usually part of output from a regression program. The prediction equation is KWH = 8.11 + 5.47(AC) + 13.22(DRYER). This model ascribes 5.47 KWH to hourly use of the AC and 13.22 KWH to each use of the DRYER, and 8.11 to all other electrical devices. Compare this prediction equation with the one including only AC in the model, KWH = 27.85 + 5.43(AC). The intercept estimate has changed substantially from 27.85 to 8.11. This change occurs because KWH consumption due to DRYER usage is combined into the intercept estimate in the model that does not contain DRYER. The estimate of the coefficient on AC has changed very little, from 5.34 to 5.47. This is related to the fact that AC and DRYER usage are relatively uncorrelated. In other words, use of one is not related to use of the other. (See Figure 2.) Generally speaking, if AC and DRYER were positively (negatively) correlated, then the regression coefficient on AC would be reduced (increased) when DRYER was added to the model.02.557.51012.515AC-0.5 0 .5 1 1.5 2 2.5 3 3.5DRYER Figure 2. Plot of AC versus DRYER Compare the values of predicted KWH from the two models. Previously, AC=10 was inserted in the simple linear prediction equation to get KWH = 27.85 + 5.34(10) = 81.25. A value of DRYER must also be inserted into the multiple regression equation to get a predicted KWH value. Trying DRYER = 0, 1, and 2 gives KWH = 8.11 + 5.47(10) + 13.22(0) = 62.81, KWH = 8.11 + 5.47(10) + 13.22(1) = 76.03, KWH = 8.11 + 5.47(10) + 13.22(2) = 89.25. An analysis of variance for a multiple linear regression model with k independent variables fitted to a data set with n observations is Source of Variation DF SS MS Regression k SSR MSR (3) Error n-k-1 SSE MSE Total n-1 SSTot The sums of squares SSR, SSE, and SST have the same definitions in relation to the model as in simple linear regression:SSR = 21ˆ(njjyy=−∑) ), SSE = , SSTot = 21ˆ(njyy=−∑21()njyy=−∑ (4) Also, SSTot=SSR+SSE. The value of SSTot does not change with the model. It depends only on the values of the dependent variable y. But SSE decreases as variables are added to a model, and SSR increases by the same amount. This amount of increase in SSR is the amount of variation due to variables in the larger model that was not accounted for by variables in the smaller model. This increase in regression sum of squares is sometimes denoted SSR(added variables | original variables), (5) where original variables represents the list of independent variables that were in the model prior to adding new variables, and added variables represents the list of variables that were added to obtain the new model. The overall SSR for the new model can be partitioned into the variation attributable to the original variables plus the variation due to the added variables that is not due to the original variables, SSR(all variables) = SSR(original variables) (6) + SSR(added variables | original variables). Generally speaking, larger values of the coefficient of determination R2=SSR/SST indicate a better fitting model. The value of R2 must necessarily increase as variables are added to the model. However, this does not necessarily mean that the model has actually been improved. The amount of increase in R2 can be a mathematical artifact rather than a meaningful indication of an improved model. Sometimes an adjusted R2 is used to overcome this shortcoming of the usual R2. Most regression computer programs include both versions of R2. The analysis of variance for the two-variable model fitted to the KWH data is Source of Variation DF SS MS Regression 2 9299.8 4649.9 Error 18 278.8 15.5 Total 20 9578.6 Adding DRYER to the model affected a dramatic change in the value of SSR, which increased from 5609.7 to 9299.8. The value of SSE dropped accordingly from 3968.9 to 278.8. The coefficient of determination is now R2=9299.8/9578.6=0.97. The two variables, AC and DRYER, account for 97% of the variability in KWH consumption in the house. This is up from R2=5609.7/9578.6=0.58 for the variable AC alone. The regression sum of squares partitioned into the amount due to AC alone plus the amount due to DRYER that was not attributable to AC, isSSR(AC and DRYER) = SSR(AC) + SSR(DRYER|AC), 9299.8 = 5609.7 + 3690.1. Thus, 3690.1 is the amount of variation due to DRYER that was not accounted for by AC. Statistical inference about the parameters requires standard errors of the estimates. A 95% confidence interval for βi is ˆ,.025ˆˆ(iidft)ββσ± (7) where tdf,.025 is the critical value from a t distribution with df=n-k-1, the degrees of freedom for error, and ˆˆiβσis the standard error of ˆiβ. Standard errors for


View Full Document

UF STA 6166 - Linear Regression 2

Documents in this Course
Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

VARIABLES

VARIABLES

23 pages

Exam #2

Exam #2

4 pages

Exam2

Exam2

6 pages

Sampling

Sampling

21 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

5 pages

Load more
Download Linear Regression 2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Linear Regression 2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Linear Regression 2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?