Therapy Data in SASSAS Code Defining g-1 Dummy-Coded Predictor VariablesSPSS Code Defining g-1 Dummy-Coded Predictor VariablesTest of Therapy with Dummy-Coded PredictorsSAS Output Testing Effect of Therapy with Dummy-Coded PredictorsSPSS Output Testing Effect of Therapy with Dummy-Coded PredictorsSAS Code Defining g-1 Effects-Coded Predictor VariablesSPSS Code Defining g-1 Effects-Coded Predictor VariablesSAS Test of Therapy with Effects-Coded PredictorsSPSS Test of Therapy with Effects-Coded PredictorsSAS Output Testing Effect of Therapy with Effects-Coded PredictorsSPSS Output Testing Effect of Therapy with Effects-Coded PredictorsSAS Code Defining g-1 Contrast-Coded Predictor VariablesSPSS Code Defining g-1 Contrast-Coded Predictor VariablesSAS Test of Therapy with Contrast-Coded PredictorsSAS Output Testing Effect of Therapy with Contrast-Coded PredictorsSPSS Output Testing Effect of Therapy with Contrast-Coded PredictorsDUMMY CODINGCreating the g-1 Dummy-Coded Predictor VariablesPartialling the Dummy-Coded PredictorsRecovering Condition Means from the Regression EquationEFFECTS CODINGCreating the g-1 Effects-Coded Predictor VariablesPartialling the Effects-Coded PredictorsRecovering Condition Means from the Regression EquationCONTRAST CODINGCreating the g-1 Contrast-Coded Predictor VariablesPartialling the Contrast-Coded PredictorsRecovering Condition Means from the Regression EquationCourse: Multiple Regression Topic: Regression Diagnostics 1REPRESENTING NOMINAL PREDICTOR VARIABLESAs we have previously discussed, MRC is flexible in that it can be used to examine the effects of multiple predictor variables on a dependent variable. Those multiple predictor variables need not be measured on quantitative scales. Some or all of the predictor variables can be measured on a nominal scale, in which the levels of the nominal scale differ qualitatively and not necessarily quantitatively. For example, we can compare differences among males and females, different countries, or different conditions of an experiment on some dependent measure. Typically differences among the levels of a nominal variable are assessed with analysis of variance (ANOVA). However, given that ANOVA can be conceptualized using the general linear model and as comparisons among linear models it should come as no surprise that ANOVAis a special case of regression. Consequently, we can analyze nominal variables with the linear model of regression. The trick to analyzing a nominal variable in regression is to assign numeric values to represent the levels of the nominal variable. Recall that the regression parameters (i.e., B) represent the amount by which the dependent variable changes per unit change in the predictor variable. Because regression equations don’t “understand” names (e.g., “males, “females,” “drug,” or “placebo”), we need to represent the names with numeric values so that the regression equation can estimate the amount by which the dependent variable changes with shifts from one level of the nominal variable (e.g., male) to another level (e.g., female). Furthermore, when the nominal variable has more than two levels multiple comparisons among the levels of the nominalvariable are necessary to extract all of the information from the variable. In particular, a variable with G levels requires g-1 comparisons. Each g-1 comparison will be represented by a separate predictor variable in the regression analysis. For example, a two-level variable (e.g., sex) can be completely explained by one comparison (e.g., male versus female) and requires a single predictor variable. A two-level variable, however, (e.g., Drug A, Drug B, Drug C,) can be completely explained by two orthogonal comparisons and requires two predictor variables in the regression equation.There are three systems for numerically coding the levels of a nominal variable that produce meaningful and directly interpretable regression parameters that estimate differences among the levels of a nominal variable. The three coding systems, which are dummy coding, effects coding, and contrast coding, produce the same results for the overall model (i.e. R2, and F-value). Consequently, when the g-1 predictors for the G-level nominal variable are treated as a set to represent the nominal variable the three coding system result in the same conclusion regarding the overall effect of the nominal variable. The coding systems, however, test different hypotheses (or address different questions) about comparisons among the levels of the nominal variable. Consequently, the three coding systems produce different estimated regression parameters (i.e., B) and statistical tests of those parameters (t- and p-values) for the g-1 predictors. (In the special case in which the nominal variable has only two levels – e.g., male andfemale – the effects coding and contrast coding systems are identical. Furthermore, the p-value for the test of the regression parameter generated by the latter two systems will be equal to the p-value for the test generated by the dummy coding system. The estimated parameter value (i.e., B)will differ, however, because the dummy coding system represents the difference between the two-levels of the nominal variable differently than does the effects/contrast coding system).A DATA SETCourse: Multiple Regression Topic: Regression Diagnostics 2To facilitate our exploration of the three coding systems we will use the following hypothetical example and data. Imagine we are interested in the efficacy of two forms of therapy,smiling and exercise, for treating depression. We randomly assign depressed-patients to receive smiling therapy, exercise therapy, or no therapy and, several weeks later, measure their depression level (assume the scale ranges from 1- 8, with higher numbers indicating more severe depression). The bogus data are as followsNo Therapy Smiling Therapy Exercise Therapy7 2 37 1 16 1 26 2 -- 2 -58.0,50.6,4 sxn55.0,60.1,5 sxn00.1,00.2,3 sxnThe data were constructed with unequal sample sizes to demonstrate that the regression analysis of the nominal variable can be conducted regardless of whether there are an equal number of observations in the levels of the nominal variable. (Although the unequal sample sizesdo not pose a problem for the analysis, we should keep in mind that the reason for the unequal samples could pose a threat to our
View Full Document