UGA STAT 4210 - Chapter 13 - D3113213

Home> Schools> University of Georgia> Statistics (STAT) > STAT 4210> Chapter 13

DOC PREVIEW

UGA STAT 4210 - Chapter 13

School name University of Georgia

Course Stat 4210- Statistical Methods

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Chapter 13 – Multiple Regression (MLR)Goals:- write and understand the MLR model for 2 or more predictors- identify advantages of MLR over SLR- interpret regression parameters (βs)- recognize that interpretation of parameters can be done only while holding other predictors constant- explain the relationship between multiple correlation (and multiple R2) with respect to predictability of response variable y to set of predictors- write hypotheses for stating that y is linearly independent of all explanatory variables in the model- use the F test and its p-value to draw appropriate conclusions about the overall model- use t-tests and confidence intervals to make conclusions about each predictor variable’s effect on the response- interpret indicator variables- use indicator variables in multiple regression- (use logistic regression for binary response variables)With ANOVA, we considered the 2-way ANOVA case, where we had two categorical predictors (e.g., excavation region and type of sherd, or origin and type of vehicle) for our continuous response.Multiple regression is the same idea in the regression setting – what if, instead of having only a single quantitative predictor we have a lot of variables to tie to the response? If that’s the case, we use multiple linear regression (MLR) methods to understand the relationships between those various predictors and the response, y.§13.1 Using Several Variables to Predict a ResponseIn SLR, our population (hypothesized) model wasμy∨x=α+βx +ϵThis is a bivariate (2-variable) model, consisting of variables x and y.With two predictors, it becomes a matter of adding the appropriate parameter and subscript:μy∨x1, x2=α + β1x1+β2x2+ϵOur sample model, or prediction equation, would be^y=a+b1x1+b2x2In general, say we have k predictors x1, x2, …, xkthat we would like to use to understand our response, y. In that case, our population model and prediction equation would, respectively, be:μy∨x1, x2,… , xk=α+β1x1+β2x2+…+βkxk+ϵ¿^y=a+b1x1+b2x2+…+bkxkWe can see that a model with k predictors has k+1 parameters (the k slopes + 1 intercept). This will be important later.Like our SLR prediction equation, this is a least squares (LS) equation; it has the smallest sum ofsquared residuals (SSE) of all possible lines.To find this line, as before, we should start with a plot of our data. We do this for MLR with whatis called a matrix plot or scatterplot matrix: it is a matrix of scatterplots, and includes a plot of the response variable against each predictor (and, if you so choose, a plot of each predictor against the others).Example: Birthweight DataRecall the birthweight data from Chapter 3. That data recorded the weight of babies (in grams), the weight of mother at conception (in pounds), and the mother’s age (in years). We are interested in modeling babies’ birth weights as a function of mothers’ weight and age. A matrix plot allows us to visually investigate all possible bivariate relationships in the dataset – we can tell not only about how individual predictors relate to the response (as in this plot), but if requested, we can look for relationships among the predictors.Just as the matrix plot breaks the visual representation into bivariate plots of response vs. each predictor, that is how we must interpret the MLR coefficients: each predictor taken one at a time.Let’s continue with the Birth Weights example.Example:In Chapter 3 we originally used mother’s weight to predict the birth weights of babies. But we also know now the mother’s ages, and maybe that is a useful predictor of baby weights, in addition to mother’s weight.We could fit a separate SLR model for each predictor: one using mothers’ weight and another using mothers’ age. However, that would be two separate tests, and would yield a problem of galloping alpha, among other issues. What we really want to know is if the two variables, considered together¸ provide any information about babies’ birth weight.We do this by putting both variables in the model at the same time. Our resulting prediction equation is:^yx1, x2=2215.7604+4.1790 x1+8.0214 x2where ^y is predicted birth weight, in gramsx1 is weight of mother at conception, in lbsx2 is age of mother, in years.What do these estimates mean?a: this means that the average weight of a baby born to a woman who is 0 years old and weighs 0lbs at conception is predicted to be 2215.76 g. Of course, this is meaningless because these are physical impossibilities. This is a classic case of extrapolation: 0 was nowhere near the range of our data values for either predictor.b1: when the mother’s weight at conception increases by 1lb and age is held constant, predicted average birth weight will increase by 4.1790g.b2: for all mothers of the same weight, when age is increased by one year the predicted average birth weight will increase by 8.0214 g.Notice: we interpret regression coefficients- one-at-a-time- when all other predictors are held constantIf we allow the other predictors to vary then we don’t know if the mean response is changing because of the predictor we are interested in (say, age) or because of the others! So we must hold the others constant to know that the effect on the mean response is due only to the predictor of interest.NB: these interpretations are identical to the interpretations of a main effect in ANOVA, which was the change in mean response due to one factor for all fixed levels of the other factor. Therefore, our MLR regression slopes are analogous to main effects in ANOVA.§13.2 Multiple Correlation and R 2Just as we have extension of the model from SLR to MLR, we need a way to understand the strength of association and the fit of the model in when we have multiple predictors.In SLR, we used the bivariate correlation coefficient, r, to describe the direction and strength of the linear association between y and x. In multiple regression, with so many predictor variables, we can’t use r in that way: there are too many relationships to describe. Instead, we use multiple correlation.Definition: the multiple correlation, R, is the correlation between the observed and predicted responses (y and ^y). Note: the multiple correlation is between predicted and observed responses and not between responses and any or all predictors; that is because correlation is a bivariate statistic and we needto describe a model that has multiple variables.Because R is the correlation

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 12 pages.

UGA STAT 4210 - Chapter 13

Sign up for free to view:

Please select your school