REPRESENTING NOMINALPREDICTOR VARIABLESNominal Variables-Predictors in a regression equation do not have to be quantitative variables-Predictors can be nominal variablesE.g.,-Male vs Female-American, German, Japanese, Italian persons-Different conditions of an experimentAnalysis Strategy for Nominal Variables-Typically the effect of a nominal variable is assessed with ANOVA-ANOVA is a special case of linear regression-We can assess nominal variables with OLS regressionNominal Variable in Regression-Beta assesses change in DV per unit change in IV-Need to represent the levels of the nominal variable with numbers to estimate amount DV changes with shifts from onelevel of a nominal variable to another level -regression doesn’t “understand” male or female-Multiple comparisons are necessary to extract all of the info from a nominal variable with more than 2 levels.-A variable with G-levels requires g-1 comparisons-Each g-1 comparison is represented by a separate predictorThree Coding Systems-Dummy Coding-Effects Coding-Contrast CodingSimilarity Among Coding Systems-When g-1 predictors are simultaneously included in the regression equation, the 3 coding systems produce the same results for the overall model (i.e., R2 and F-value). -So, when g-1 predictors are treated as a set to represent the nominal variable, the 3 coding systems produce the same conclusion regarding the omnibus effect of the nominal variable.Difference Among Coding Systems-Each coding system tests different hypotheses regarding the comparisons among the levels of the nominal variable-So, the coding systems produce different estimated regression parameters (B) for the g-1 predictors and statisticaltests of those predictors (i.e., t)(When the nominal variable has only 2-levels, the effects coding and contrast coding systems are identical & both result in the same statistical test – i.e. t- & p-value – of the regression parameter as does dummy coding. However, the value of the regression parameter will differ for the dummy and contrast/effects coding.)A Data Set-Randomly assign depressed patients to one of three therapy conditions. Subsequently, assess depression (1-8, higher numbers = greater depression)NoTherapySmilingTherapyExerciseTherapy7 2 37 1 16 1 26 2 -- 2 -50.6x4n60.1x5n00.2x3n-unequal n, is not a problem for the regression analysis(may compromise causal inference)Data In SASTo Compare ANOVA & RegressionA one-factor ANOVA indicated that the omnibus effect of therapy on depression was significant, F(2, 9) = 64.97, p = .0001. Two orthogonal contrasts revealed that depression was greater in the no-therapy condition than in the mean of the smiling and exercise therapy conditions, F(1, 9) = 123.4, p = .0001, and the latter conditions did not differ, F(1, 9) = 0.64, p = .4433.Therapy: F(2, 9) = 64.97, p = .0001No-therapy vs Smiling&Exercise: F(1, 9) = 123.4, p = .0001Smiling vs Exercise: F(1, 9) = 0.64, p = .4433Dummy Coding-One of the G-levels of the nominal variable is treated as a reference level-g-1 predictors compare other levels to the reference level(only when g-1 predictors are fully partialled)E.g., Therapy has 3-levels so need 2 predictors-treat no-therapy as reference levelpredictor1: smiling vs no-therapypredictor2: exercise vs no-therapyCreating g-1 Dummy-Coded Predictors-Participant receives either 0 or 1 on each g-1 predictor-Receive a 1 if in the condition being compared to the reference level-Receive a 0 if not in the condition being compared to the reference level-Receive a 0 if in the condition that serves as reference levelDummy Coding as a Function of ConditionUnpartialled & Partialled Predictors-Unpartialled X1 compares smiling with “not-smiling” (i.e., a weighted mean of exercise and no-therapy)- Unpartialled X2 compares exercise with “not-exercise” (i.e., a weighted mean of smiling and no-therapy)-X1 and X2 contain redundant information and are correlated-Partialling X2 from X1 removes “exercise” info and creates acomparison of smiling with no-therapy-Partialling X1 from X2 removes “smiling” info and creates a comparison of exercise with no-therapyPartialling yields unique meaning to dummy codingCreating Dummy-Coded Predictors in SASTesting Therapy in SAS-The set of g-1 predictors contain the effect of Therapydepression = d_smil d_exer vs depression =Difference between models yields the effect of Therapyproc reg;model depress = d_smil d_exer;run;SAS Output-Model comparison reveals same results as ANOVA.Omnibus effect of Therapy: F(2, 9) = 64.97, p = .0001Interpretation of Regression ParametersCondition MeanNo-therapy Smiling Exercise6.50 1.60 2.00Depression = 6.50 – 4.90(d_smil) – 4.50(d_exer)-Y-intercept: mean of the reference level-smildB_ = -4.90 (difference between smiling and no-therapy)Mean level of depression is 4.9 points less in smiling-therapy condtion than no-therapy condition (i.e., 1.60 – 6.50 = -4.90)-exerdB_=-4.50 (difference between exercise and no-therapy)Mean level of depression is 4.5 points less in exercise-therapy condtion than no-therapy condition (i.e., 2.00 – 6.50 = -4.50)Recovering Condition Means from Model-Plug 0/1 codes for each condition in the model-No-therapy coded 0 for d_smile and 0 for d_exerDepression = 6.50 – 4.90(0) – 4.50(0) = 6.50-Smiling therapy coded 1 for d_smile and 0 for d_exerDepression = 6.50 – 4.90(1) – 4.50(0) = 1.60-Exercise therapy coded 0 for d_smile and 1 for d_exerDepression = 6.50 – 4.90(0) – 4.50(1) = 2.00sr2 for Dummy-Coded Predictors-The unique interpretation of the dummy coded predictors are obtained by partialling each from the other-Therefore, the sr2 are from a simultaneous regression and they do not sum to the R2 of the full model-Because the g-1 predictors are a set (i.e., Therapy) it doesn’t make sense to enter them hierarchically (can be entered as a set in a hierarchical model to partial-out effects of other variables)-sr2 for d_smil indicates the proportion of variability in depression that is explained by the difference between smilingtherapy and no-therapyEffects Coding-Partialled effects-coded predictor compares the mean of a given level of nominal variable with the unweighted mean of the means of all levels of the nominal variableE.g., 3therapynoexercisesmilingsmilingXXXX-Derived from an ANOVA framework in which the “effect”
View Full Document