Unformatted text preview:

11. Multiple RegressionParameter InterpretationSlide 3Prediction equationExample: Mental impairment studySlide 6Slide 7Slide 8Slide 9Predicted values and residualsCommentsSlide 12Graphics for multiple regressionSlide 14Slide 15Slide 16Multiple correlation and R2Example: Mental impairment predicted by life events and SESSlide 19Slide 20Slide 21Properties of R and R2Slide 23Inference for multiple regressionCollective influence of explanatory var’sSlide 26Properties of F distributionExample: Is mental impairment independent of life events and SES?Slide 29Slide 30Inferences for individual regression coefficients (Need all predictors in model?)Slide 32Example: Effect of SES on mental impairment, controlling for life eventsSlide 34A caution: “Overlapping variables” (multicollinearity)Slide 36Modeling interaction between predictorsSimplest interaction model: Introduce cross product terms for predictorsSlide 39Slide 40Comments about interaction modelSlide 42Comparing two regression modelsSlide 44Slide 45Test statistic with df1 = 1, df2 = 36. P-value = . We cannot reject H0 at the usual significance levels (such as 0.05). The simpler model is adequate. Note: Since only one parameter in null hypo., the F test statistic is the square of t = b3/se for testing H0: b3 = 0. The t test also gives P-value = , for Ha: b3  011. Multiple Regression•y – response variable x1, x2 , … , xk -- a set of explanatory variablesIn this chapter, all variables assumed to be quantitative. Multiple regression equation (population): E(y) =  + 1x1 + 2x2 + …. + kxkParameter Interpretation = E(y) when x1 = x2 = … = xk = 0.1, 2 , … , k are called partial regression coefficients.Controlling for other predictors in model, there is a linear relationship between E(y) and x1 with slope 1.i.e., consider case of k = 2 explanatory variables, E(y) =  + 1x1 + 2x2If x1 goes up 1 unit with x2 held constant, the change in E(y) is [ + 1(x1 + 1) + 2x2] – [ + 1x1 + 2x2] = 1.Prediction equation•With sample data, software finds “least squares” estimates of parameters by minimizing SSE = sum of squared prediction errors (residuals) = (observed y – predicted y)2Denote the sample prediction equation by1 1 2 2ˆ...k ky a b x b x b x= + + + +Example: Mental impairment study•y = mental impairment (summarizes extent of psychiatric symptoms, including aspects of anxiety and depression, based on questions in “Health opinion survey” with possible responses hardly ever, sometimes, often) •x1 = life events score (composite measure of number and severity of life events in previous 3 years) •x2 = socioeconomic status (composite index based on occupation, income, and education) Data set (n = 40) at www.stat.ufl.edu/~aa/social/data.html and p. 327 of textOther predictors in study, not used here, included age, marital status, gender, race•Bivariate regression analyses give prediction equations:•Correlation matrixPrediction equation for multiple regression analysis is:Predicted mental impairment:•increases by for each 1-unit increase in life events, controlling for SES.•decreases by for each 1-unit increase in SES, controlling for life events. (e.g., decreases by when SES goes from minimum of 0 to maximum of 100, which is relatively large since sample standard deviation of y is 5)•Can we compare the estimated partial regression coefficients to determine which explanatory variable is “most important” in the predictions?•These estimates are unstandardized and so depend on units.•Standardized coefficients” presented in multiple regression output refer to partial effect of a standard deviation increase in a predictor, keeping other predictors constant. (Sec. 11.8). •In bivariate regression, standardized coeff. = correlation. In multiple regression, stand. coeff. relates algebraically to “partial correlations” (Sec. 11.7).Predicted values and residuals•One subject in the data file has y = 33, x1 = 45 (near mean), x2 = 55 (near mean)This subject has predicted mental impairment(near mean)The prediction error (residual) is i.e., this person has mental impairment higher than predicted given his/her values of life events, SES.SSE = smaller than SSE for either bivariate model or for any other linear equation with predictors x1 , x2.Comments•Partial effects in multiple regression refer to controlling other variables in model, so differ from effects in bivariate models, which ignore all other variables.•Partial effect of x1 (controlling for x2) is same as bivariate effect of x1 when correlation = 0 between x1 and x2 (as is true in most designed experiments).•Partial effect of a predictor in this multiple regression model is identical at all fixed values of other predictors in modelExample: At x2 = 0, At x2 = 100, 1 2ˆ28.23 0.103 0.097y x x= + -•This parallelism means that this model assumes no interaction between predictors in their effects on y. (i.e., effect of x1 does not depend on value of x2)•Model is inadequate if, in reality (insert graph)•The model E(y) =  + 1x1 + 2x2 + …. + kxk is equivalently expressed as y =  + 1x1 + 2x2 + …. + kxk + where  = y – E(y) = “error” having E() = 0 is population analog of residual e = y – predicted y.Graphics for multiple regression•Scatterplot matrix: Scatterplot for each pair of variables•Partial regression plots: One plot for each predictor, shows its partial effect controlling for other predictorsExample: With two predictors, show partial effect of x1 on y (i.e., controlling for x2) by using residuals afterRegressing y on x2 Regressing x1 on x2 Partial regression plot is a scatterplot with residuals from regressing y on x2 on vertical axis and residuals from regressing x1 on x2 on horizontal axis. The prediction equation for these points has the same slope as the effect of x1 in the prediction equation for the multiple regression model.Multiple correlation and R2•How well do the explanatory variables in the model predict y, using the prediction equation?•The multiple correlation, denoted by R, is the correlation between the observed y-values and predicted values from the prediction equation.i.e., it is the ordinary correlation between y and an artificial variable whose values for the n subjects in the sample are the predicted values from the prediction


View Full Document

UF STATISTICS 101 - Multiple Regression

Download Multiple Regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiple Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiple Regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?