Unformatted text preview:

Multiple Regression Analysis MLR extension of the SLR for investigating how response y is affected by several independent variables x1 xk The estimated regression equation can be found by minimizing the sum of squared equation residuals least squared method This equation is where b1 b2 bk are estimates Model where error R Squared describes relative improvement from using prediction equation instead of using sample mean sum of squares total measures variation about the mean in responses ybar R2 Properties tells us what of variation in responses is explained by regression model sum of squares equal measures spread about regression equation measures variation explained by the regression model 1 R2 is between 0 and 1 2 R2 1 when all residuals are 0 3 R2 0 when each 4 R2 gets larger or at least stays the same whenever an independent variable is added to 3 P value Two tail probability from t distribution of values larger than observed t test stats The t distribution has df n of parameters 4 Conclusion compare p value to significance level if decision needed the multiple regression model 5 R2 does not depend on units of measurement Hypothesis Test 1 2 Test Statistics Estimate Margin of Error Confidence Interval for i More Test Statistics Hypothesis Test H0 1 2 k 0 Ha At least one 0 An estimate for or p of s ANOVA Table Degrees of Freedom Sum of Squares SSE Mean Sum of Squares F Stat P Value Model Regression of s 1 or SSR Error Residuals n of s or SSE k n k 1 Total n 1 SST MSR MSE F stat a test comparing statistical models that have been fitted to a data set k number of predictor variables Test Statistic F Distribution Fa b P Value The area is always going to the right and one sided Conclusion Compare p value to If p value reject H0 If p value fail to reject Ha R2 Adjusted 1 Conditions for Multiple Regression Model LINE 1 Linearity look at all scatterplots of y vs xi to see if they are linear 2 Independence assume observations come from an SRS 3 Normality make histogram of residuals make a ggplot of residuals 4 Equal Spread residual plots are equally far away from zero Simple Linear Regression SLR Inference for SLR The data in a scatterplot are a random sample from a population that may exhibit a linear relationship between x and y DIFFERENT SAMPLE DIFFERENT PLOT In the population linear Sample Data then fits the Population Model Above regression equation is model Date fit residual 0 Linear Regression assumes equal variance of y is same for all values of x Conditions for SLR 1 Linearity 2 Independence 3 Normality 4 Equal Spread Estimating the Parameters Least squares regression line is the best estimate of the true population regression line The population standard deviation for y at any given value of x represents the spread of the normal distribution of the i around mean y Standard error about the regression line Confidence Interval for Regression Parameters Estimating regression parameters 0 1 is a case of one sample inference with unknown population variance Rely on the T distribution with n 2 degrees of freedom therefore Hypothesis Test H0 1 0 x and y are not linearly related Ha 1 0 x and y are linearly related T Test tn 2 P Value sum of the areas Conclusion is the same as usual Confidence Interval for y tobs tobs mean response 2 Predicting Individual Response Prediction Intervals Unusual Observations Outliers stand as either 1 a large residual or 2 a large distance from A high leverage point is influential if omitting it changes slope of the regression model Influential Point similar to high leverage point does not necessarily have a high residual Normally Distributed Populations When a variable in a population is normally distributed the sampling distribution of x for all possible samples of size n is also normally distributed Central Limit Theorem if a population has mean and std dev then for a large enough sample n 25 then n 40 is preferable Confidence Interval for pop means or If is unknown then estimate by using the sample std dev This is a good estimate if n is large 1 t distribution with df n 1 Robustness the t procedures are robust to small deviations from normality meaning that the results will not be affected much Some factors that matter are 1 Random Sampling and 2 Outliers and Skewness Specifically 1 When n 15 the data must be close to normal and without outliers 2 When 15 n 40 mild skewness is acceptable but no outliers 3 When n 40 t stats are valid even with stong skewness Comparing Two Groups Two populations p1 and Take an SRS from each population Sample 1 Sample 2 An estimate of 1 2 x1 x2 1 1 Hypothesis Tests for Two Population Means H0 1 2 Ha 1 2 1 2 or 1 2 Confidence Interval for 1 2 H0 1 2 0 Ha 1 2 0 1 2 0 or 1 2 0


View Full Document

UMD BMGT 230 - Notes

Documents in this Course
Data

Data

2 pages

Notes

Notes

8 pages

Notes

Notes

2 pages

Notes

Notes

3 pages

Exam

Exam

10 pages

Notes

Notes

1 pages

EXAM 1

EXAM 1

3 pages

Exam 3

Exam 3

16 pages

Notes

Notes

1 pages

Notes

Notes

1 pages

Notes

Notes

1 pages

Exam 2

Exam 2

6 pages

Exam 2

Exam 2

6 pages

Notes

Notes

2 pages

Notes

Notes

2 pages

Load more
Download Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?