DOC PREVIEW
WUSTL CSE 567M - Simple Linear Regression Models

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

14-1©2008 Raj JainCSE567MWashington University in St. LouisSimple Linear Simple Linear Regression ModelsRegression ModelsRaj Jain Washington University in Saint LouisSaint Louis, MO [email protected] slides are available on-line at:http://www.cse.wustl.edu/~jain/cse567-08/14-2©2008 Raj JainCSE567MWashington University in St. LouisOverviewOverview1. Definition of a Good Model2. Estimation of Model parameters3. Allocation of Variation4. Standard deviation of Errors5. Confidence Intervals for Regression Parameters6. Confidence Intervals for Predictions7. Visual Tests for verifying Regression Assumption14-3©2008 Raj JainCSE567MWashington University in St. LouisSimple Linear Regression ModelsSimple Linear Regression Models! Regression Model: Predict a response for a given set of predictor variables.! Response Variable: Estimated variable! Predictor Variables: Variables used to predict the response. predictors or factors! Linear Regression Models: Response is a linear function of predictors. ! Simple Linear Regression Models: Only one predictor14-4©2008 Raj JainCSE567MWashington University in St. LouisDefinition of a Good ModelDefinition of a Good ModelxyxyxyxyGood Good Bad14-5©2008 Raj JainCSE567MWashington University in St. LouisGood Model (Cont)Good Model (Cont)! Regression models attempt to minimize the distance measured vertically between the observation point and the model line (or curve).! The length of the line segment is called residual, modeling error, or simply error. ! The negative and positive errors should cancel out ⇒ Zero overall error Many lines will satisfy this criterion.14-6©2008 Raj JainCSE567MWashington University in St. LouisGood Model (Cont)Good Model (Cont)! Choose the line that minimizes the sum of squares of the errors. where, is the predicted response when the predictor variable is x. The parameter b0and b1are fixed regression parameters to be determined from the data.! Given n observation pairs {(x1, y1), …, (xn, yn)}, the estimated response for the ith observation is:! The error is:14-7©2008 Raj JainCSE567MWashington University in St. LouisGood Model (Cont)Good Model (Cont)! The best linear model minimizes the sum of squared errors (SSE):subject to the constraint that the mean error is zero:! This is equivalent to minimizing the variance of errors (see Exercise).14-8©2008 Raj JainCSE567MWashington University in St. LouisEstimation of Model ParametersEstimation of Model Parameters! Regression parameters that give minimum error variance are:! where,and14-9©2008 Raj JainCSE567MWashington University in St. LouisExample 14.1Example 14.1! The number of disk I/O's and processor times of seven programs were measured as: (14, 2), (16, 5), (27, 7), (42, 9), (39, 10), (50, 13), (83, 20)! For this data: n=7, Σ xy=3375, Σ x=271, Σ x2=13,855, Σ y=66, Σ y2=828, = 38.71, = 9.43. Therefore,! The desired linear model is:14-10©2008 Raj JainCSE567MWashington University in St. LouisExample 14.1 (Cont)Example 14.1 (Cont)14-11©2008 Raj JainCSE567MWashington University in St. LouisExample 14. (Cont)Example 14. (Cont)! Error Computation14-12©2008 Raj JainCSE567MWashington University in St. LouisDerivation of Regression ParametersDerivation of Regression Parameters! The error in the ith observation is:! For a sample of n observations, the mean error is:! Setting mean error to zero, we obtain:! Substituting b0 in the error expression, we get:14-13©2008 Raj JainCSE567MWashington University in St. LouisDerivation of Regression Parameters (Cont)Derivation of Regression Parameters (Cont)! The sum of squared errors SSE is:14-14©2008 Raj JainCSE567MWashington University in St. LouisDerivation (Cont)Derivation (Cont)! Differentiating this equation with respect to b1and equating the result to zero:! That is,14-15©2008 Raj JainCSE567MWashington University in St. LouisAllocation of VariationAllocation of Variation! Error variance without Regression = Variance of the responseand14-16©2008 Raj JainCSE567MWashington University in St. LouisAllocation of Variation (Cont)Allocation of Variation (Cont)! The sum of squared errors without regression would be:! This is called total sum of squares or (SST). It is a measure of y's variability and is called variation of y. SST can be computed as follows:! Where, SSY is the sum of squares of y (or Σ y2). SS0 is the sum of squares of and is equal to .14-17©2008 Raj JainCSE567MWashington University in St. LouisAllocation of Variation (Cont)Allocation of Variation (Cont)! The difference between SST and SSE is the sum of squares explained by the regression. It is called SSR:or! The fraction of the variation that is explained determines the goodness of the regression and is called the coefficient of determination, R2:14-18©2008 Raj JainCSE567MWashington University in St. LouisAllocation of Variation (Cont)Allocation of Variation (Cont)! The higher the value of R2, the better the regression. R2=1 ⇒ Perfect fit R2=0 ⇒ No fit! Coefficient of Determination = {Correlation Coefficient (x,y)}2! Shortcut formula for SSE:14-19©2008 Raj JainCSE567MWashington University in St. LouisExample 14.2Example 14.2! For the disk I/O-CPU time data of Example 14.1:! The regression explains 97% of CPU time's variation.14-20©2008 Raj JainCSE567MWashington University in St. LouisStandard Deviation of ErrorsStandard Deviation of Errors! Since errors are obtained after calculating two regression parameters from the data, errors have n-2 degrees of freedom! SSE/(n-2) is called mean squared errors or (MSE). ! Standard deviation of errors = square root of MSE. ! SSY has n degrees of freedom since it is obtained from nindependent observations without estimating any parameters.! SS0 has just one degree of freedom since it can be computed simply from ! SST has n-1 degrees of freedom, since one parameter must be calculated from the data before SST can be computed.14-21©2008 Raj JainCSE567MWashington University in St. LouisStandard Deviation of Errors (Cont)Standard Deviation of Errors (Cont)! SSR, which is the difference between SST and SSE, has the remaining one degree of freedom.! Overall,! Notice that the degrees of freedom add just the way the sums of squares do.14-22©2008 Raj JainCSE567MWashington University in St. LouisExample 14.3Example 14.3! For the disk I/O-CPU data of Example 14.1, the degrees of freedom of the sums are:! The mean squared error is:! The standard deviation of errors is:14-23©2008 Raj


View Full Document

WUSTL CSE 567M - Simple Linear Regression Models

Documents in this Course
Load more
Download Simple Linear Regression Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Simple Linear Regression Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Simple Linear Regression Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?