Unformatted text preview:

Automated Regression Modeling• Descriptive vs. Predictive Regression Models• Four common automated modeling procedures• Forward Modeling• Backward Modeling• Forward Stepwise Modeling• All-subsets Regression• Problems & Difficulties with automated modelingRegression Model Building -- we’ve talked about two kinds..Descriptive Modeling• explication of the relationships between the set of predictors and the criterion variable (90-95%)• completeness is the goal -- often requiring the combination of information from simple correlations and various multiple regression modelsPredictive Modeling• are going to compute y’ values to make decisions about people (5-10%)• efficiency is the goal, want the “best model” with the “fewest predictors”Interpretation of simple correlations, of full models, and of comparisons of nested and non-nested models are commonly combined to yield descriptive modeling-- remember, the interest is in completely explicating the patterns of relationship between the criterion variable and the predictors. Theory often plays a large part, directing the variables we choose and the models we compare.When interested in predictive modeling, we want 1 model that we will use to compute y’ values for future individuals. Theory can help, cost-analysis is often important, collinearity is to be avoided (because it reduces efficiency).The various “automated regression procedures” were designed for predictive model building (not for descriptive modeling).The four most commonly used “automated” procedures are…Forward Inclusion -- start with the best predictor and addpredictors to get a “best model”Backward Deletion -- start with a full model and delete predictors to get a “best model”Forward Stepwise Inclusions -- a combination of the first twoAll-subsets Regression -- literally getting all 1-, 2-, 3-, ... k-predictors models for comparisonForward modelingStep 1 the first predictor in the model is the “best single predictor”Select the predictor with the numerically largest simple correlation with the criterion -- if it is a significant correlationry,x1vs. ry,x2vs. ry,x3vs.ry,x4 by using this procedure we are sure that the initial model “works”Step 2 the next predictor in the model is the one that will “contribute the most” -- with two equivalent definitions1. The 2-predictor model (including the first predictor) with the numerically largest R² -- if the R² is significant and significantly larger than the r² from the first stepR2y.x3,x1 vs. R2y.x3,x2vs. R2y.x3,x4 2. Add to the model that predictor with the highest semi-partial correlation with the criterion, controlling the predictor for the predictor already in the model -- if the semi-partial is significantry(x1.x3) vs. ry(x2.x3)vs. ry.(x4.x3) by using this procedure we are sure the 2-predictor model “works”and “works better than the 1-predictor model”All subsequent steps -- the next predictor in the model is the one that will “contribute the most” -- with two equivalent definitions1. The 2-predictor model (including the first predictor) with the numerically largest R² -- if the R² is significant and significantly larger than the R² from the previous stepR2y.x3,x2,x1vs. R2y.x3,x2,x4 2. Add to the model that predictor with the highest semi-partial correlation with the criterion, controlling the predictor for the predictors already in the model -- if the semi-partial is significantry.(x1.x3,x2)vs. ry.(x4.x3,x2) by using this procedure, we are sure that each model “works” and “works better than the one before it”When to quit ??? When no additional predictor will significantly increase the R² (same as when no multiple semi-partial is significant).Difficulties with the forward inclusion model…•The major potential problem is “over-inclusion” -- a predictor that contributes to a smaller (earlier) model fails to continue to contribute as the model gets larger (with increased collinearity), but the predictor stays in the model.•Fairly small “variations” in the correlation matrix can lead to very different final models -- models often differ on two “splits” of the same sample •The resulting model may not be the “best” -- there may be another model with the same # predictors but larger R², etcAll of these problems are exacerbated by increased collinearity !!Backward DeletionStep 1 -- start with the full model (all predictors) -- if the R² is significant. Consider the regression weights of this model.Step 2 -- remove from the model that predictor that “contributes the least”Delete that predictor with the largest p-value associated with its regression (b) weight -- if that p-value is greater than .05. (The idea is … the predictor with the largest p-value is the one “least likely to not be contributing to the model” in the population)bx1(p=.08) vs. bx2(p=.02) vs. bx3(p=.02) vs. bx4(p=.27)by using this procedure, we know that each model works as well as the previous one (R² numerically, but not statistically smaller)On all subsequent steps -- the next predictor dropped from the model is that with the largest (non-significant) regression weight.bx1(p=.21) vs. bx2(p=.14) vs. bx3(p=.012)When to quit ?? When all the predictors in the model are contributing to the model.Difficulties with the backward deletion model…•The major potential problem is “under-inclusion” -- a predictor that is deleted from a larger (earlier) model would contribute to a smaller model, but isn’t “re-included”.•Fairly small “variations” in the correlation matrix can lead to very different final models -- models often differ on two “splits” of the same sample •The resulting model may not be the “best” -- there may be another model with the same # predictors but larger R², etcAll of these problems are exacerbated by increased collinearity !!Forward Stepwise ModelingStep 1 the first predictor in the model is the “best single predictor”(same as the forward inclusion model)Select the predictor with the numerically largest simple correlation with the criterion -- if it is a significant correlationby using this procedure we are sure that the initial model “works”Step 2 the next predictor in the model is the one that will “contribute the most” -- with two equivalent definitions(same as the forward inclusion model)1. The 2-predictor model (including the first predictor)


View Full Document

UNL PSYC 451 - Automated Regression Modeling

Download Automated Regression Modeling
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Automated Regression Modeling and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Automated Regression Modeling 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?