**Unformatted text preview:**

G roup comparisons in logit and probitu sin g p r ed ic te d p r o b a b ilitie s1J. Scott LongIndiana Universit yJune 25, 2009AbstractThe compa rison of groups in regression models for binary outcom es iscomplica ted by an iden tification problem inheren t in these models. Tra-ditional tests of th e equality of coeﬃcients acro ss grou ps co nfoun d themag nitud e of the regressio n coeﬃcients with residual variation. If theamoun t of residual variation diﬀers between groups, the test can leadto in co rrect conclusions (Allison 19 99 ). Allison proposes a test for theequalit y of regression coeﬃcients that remov es the eﬀect of group diﬀer-ences in residual variation by adding the assumption that the regressioncoeﬃcients for some variables are iden tical across groups. In practice, aresearcher is unlike ly to hav e either empirical or theoretical justificationfor this assumption, in whic h case the A llison’s test can also lead to incor-rect conclusions. An altern ative approach, suggested here, uses pr ed ictedproba bilities. Since predicted pro ba bilities are unaﬀected by residual vari-ation, test s of the e qua lity of predicted p ro babilities across gr oup s can beused for group comparisons without assuming the equality of the regres-sion coeﬃcien ts of some varia bles. Using predicted probabilities requiresresearc her s to think diﬀerently about comparing groups. With tests of theequalit y of regression coeﬃcients, a single test lets the researc her concludeeasily whether the eﬀects of a variable are equal a cross groups. Testing theequality of predicte d probab ilities requires multiple tests since group dif-ferences in prediction s vary with the levels of the variables in the model.A researcher m ust examine group diﬀeren ces in predictions at multiplelevels of the variables often requiring more complex conclusions on ho wgroups diﬀer in the eﬀect of a variable.1I thank Paul Allison, Ken Bollen, Rafe Stolzenberg, Pravin Trivedi, and Rich Williams for theircomments.1G roup comparisons in logit and probitusing predicted prob abilities1OverviewThe comparison of groups is fundamental to researc h in man y areas and tests com-paring groups have receiv ed a great deal of atten tion . Chow’s (1960) paper, declareda “citation classic” (G arfield 1984), provides a general framework for group com-parisons in the linear regression model. Suppose that we are compa ring the eﬀectof on for women and men, where and are the coeﬃcients of in terest.If 0: = is rejected, we conclude that the eﬀect of diﬀers for men andwomen. Th is approach to testing group diﬀerences can be applied to many typesof regression models as shown b y Liao (2002). Allison (1999) poin ts out a criticalproblem when this test is used in models such as logit or probit. For these models,standard tests can lead to incorrect conclusions since they confound the magnitudeof the regression coeﬃcien ts with the amount of residual variation. Allison proposesa test that remo ves the eﬀect of residual variation b y assum ing that the coeﬃcien tsfor at least one independent variable are the same in both groups. Unfortunately, aresearc her might lac k suﬃcient theoretical or empirical information to justify suc h anassumption. Making an ad hoc decision that some regression coeﬃcients are equalcan lead to incorrect conclusions.2Tests of predicted probabilities pro vid e an al-ternativ e approac h for comparing groups that is unaﬀected by group diﬀerences inresidual variation and does not require assumptions about the equality of regressioncoeﬃcien ts for some variables. Group comparisons are made by testing the equalit yof predicted probabilities at diﬀeren t values of the independent variables.This paper begins by reviewing wh y standard tests of the equality of regressioncoeﬃcients across groups are inappropriate in some t ypes of models. I then showwhy predicted probabilities are unaﬀected b y residual variation and present a testof the equality of predicted probabilities that can be used for group comparisons ofeﬀects in models such as logit and probit. To illustrate this approach, I begin witha model that includes a single independen t variable, in which case all informationabout group diﬀerences can be sho w n in a simple graph. I extend this approach tomodels with m u ltiple independent variables, which requires mo re com plex analysisdue to the nonlinearit y of the models.2Williams (2009) raises other concerns with this tests that are discussed below.22 M odel iden tification in logit and probitDiscussions of the logit and probit model often note that the slope coeﬃcients areonly identified up to a scale factor (Maddala 1983:23). This lac k of iden tification iswhy standard tests of the equalit y of regression coeﬃcients across groups should notbe used. To see how identification causes a problem in group comparison s and whypredicted probabilities are not aﬀected b y this problem, consider ho w these models arederived using an underlying latent variable (see Long 1997: 40-50 for a full derivation).Suppose that the latent ∗is linearly related to an observed through the structuralmodel∗= 0+ 1 + where I use a single independent variable for simplicity. The laten t ∗is link e d to anobserved, binary b y the measurement equation =½1 if ∗ 00 if ∗≤ 0 (1)If ∗is greater than 0, is observed as 1. Otherwise, is observed as 0. For example,when a person’s propensit y to be in the labor force exceeds 0, she joins the laborforce (i.e., =1). If her propensit y is at or below 0, she is not in the labor force (i.e., =0).Accord ing to equation 1, the probability of =1is the proportion of the distrib-ution of ∗that is above 0 at a given value of :Pr ( =1| )=Pr(∗ 0 | ) (2)Substitu ting ∗= 0+ 1 + and rearranging terms, the probability can be writtenin terms of the distribution of the errors:Pr( =1| )=Pr( ≤ 0+ 1 | )[ Figures 1 & 2 about her e ]Figure 1 illustrates the relationship between the structural model for ∗and theprobability of for specific values of the param eters labeled as set :∗= 0+ 1 + (3)= −6+1 + Assum e that is normally distributed with mean 0 and variance 2=1,whicharethe