New version page

# Logistic Regression

## This preview shows page 1 out of 3 pages.

View Full Document
Do you want full access? Go Premium and unlock all 3 pages.

Unformatted text preview:

1NeuendorfLogistic RegressionAssumptions:1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominaldata for a single DV. . . bear in mind that other types of IVs are allowed when they have beendummy or otherwise coded (SPSS will actually convert categorical IVs to dummies automatically(!) for this procedure only, if it’s indicated by the user in the “Categorical” section of [Binary]Logistic Regression). 2. Predicts the odds of an event occurring (see Addendum 1), which is based on the probability ofthat event occurring. Precisely, the odds of an event occurring is:Odds = prob. of event occurring = P prob. of event not occurring 1 - P 3. A “nonlinear” (specifically an S-shaped, or sigmoidal, curve) relationship between IVs and theDV; however, this represents a linear relationship between the logit (natural log of the odds of thedependent occurring or not) and the set of IVs. See Addendum 2 for an illustration that comparesprobabilities, odds, and the logit. 4. Uses a maximum-likelihood rather than least-squares statistical model. In least squares, we selectregression coefficients that result in the smallest sum of squared differences between the observedand the predicted values of the DV. In maximum-likelihood, the coefficients that make ourobserved results “most likely” are selected.5. Residuals follow a binomial rather than a normal distribution. Normality of variables is not astringent requirement.6. Does not assume homoscedasticity.7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the[binary] Logistic Regression procedure).Decisions to make:1. [Hierarchical] Blocks vs. simultaneous model2. Forced entry (“Enter”) vs. stepwise entry of IVs (several options available for both “Forward” and“Backward”–right click in SPSS for more info.)Statistics:1. Goodness-of-fit indicators for overall model (“omnibus tests”)A. -2 Log Likelihood (-2LL)--This stat is integral to the notion of the “maximum likelihood”analytical model; it compares the likelihood function of the final model with that for abaseline model (i.e., one in which there is no knowledge about the IVs). The values for -2LL are often large and, much like eigenvalues, don’t make objective sense. The -2LL is a2cumulative measure across all cases and its size is therefore highly dependent on n.A small value for -2LL indicates a good fit. A “perfect fit” by the model (called a“saturated model”) will result in a likelihood of 1.0, and a -2LL of 0. In SPSS, chi-squarestest the significance of changes in -2LL; we like these chi-squares to be significant.B. “R2-like” measures--Cox & Snell, Nagelkerke, pseudo-R2 (this last one must be hand-calculated; see Hair p. 362). These stats use the log likelihoods of the baseline and finalmodel to construct a “reduction-in-error” proportion, much like R2.C. Hosmer & Lemeshow Chi-square--tests the predictive accuracy of the model by splittingthe sample into deciles (ten groups) on the basis of the probability of DV=1 for thepurposes of constructing a chi-square table. Here, a non-significant chi-square indicates agood model fit; the actual and predicted values on the DV do not differ significantly, inthis case.2. Score statistic--this tests the significance of parameter estimates computed via maximumlikelihood methods. The test assesses whether a given IV related significantly to the DV. The testis based on the behavior of the log-likelihood function at the point where the tested parameter iszero. SPSS presents this stat for variables “not in the equation” at that point.3. Logistic coefficients (B’s)--These are unstandardized coefficients that correspond to the b’s inmultiple regression (i.e., the unstandardized partial regression coefficients). The overall equationis:(a) Logit = ln(Odds) = B0 + B1X1 + B2X2 + B3X3A. The Wald statistic tests each B (rather than a t-test, the typical test of a “b” or Beta inmultiple regression). Wald has a chi-square distribution. Its formula is: Wald = (B/SEB)2B. Exp(B)--exponentiated B, the result of transforming both sides of the above equation (a)such that the left now is straight odds:(b) Odds = eB0 eB1X1 eB2X2 eB3X3Each Exp(B) indicates a decreased or increased odds of occurrence of DV. A value of<1.0 indicates a lowered odds as a result of that IV, and a value of >1.0 indicates anenhanced odds as a result of that IV. For example, an Exp(B) of 1.20 would indicate thatan increase of one unit in that IV would result in a predicted increase in the odds of 20%. An Exp(B) of .75 would indicate a predicted decrease in the odds of 25%. (Note that thecoefficients are partials, indicating that the effect assumes that all other IVs are heldconstant (controlled for).) 4. Classification analysis--Like in discriminant analysis. . . we obtain an overall “hit” rate (% ofcases correctly classified). Can also get casewise diagnostics, as in discriminant, and a one-dimensional classification plot.3ADDENDUM 1–Terminology for use with Logistic RegressionProbability = P = probability of an event occurring (range of 0 - 1)Odds = P = ratio of the probability of an event occurring to the 1 - P probability of the event not occurring (range of 0 - pos.infinity)Odds ratio = Odds1=P1/1 - P1= ratio of two oddsOdds2P2/1 - P2Logit = ln(Odds) = predicted logged odds (range of neg. infinity - pos. infinity)NOTE: The Hair et al. book calls the “odds” the “odds ratio.” This runs counter to the use of theterms by Pampel (2000) and by Hosmer and Lemeshow (2000). This handout uses the terms asthey are presented by Pampel and Hosmer and Lemeshow.ADDENDUM 2–Illustration of the relationship between probabilities, odds, and ln(odds):P .01 .1 .2 .3 .4 .5 .6 .7 .8 .9 .991 - P .99 .9 .8 .7. .6 .5 .4 .3 .2 .1 .01Odds .01 .111 .25 .429 .667 1 1.5 2.33 4 9 99Logit -4.60 -2.20 -1.39 -.847 -.405 0 .405 .847 1.39 2.20 4.60References:Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York: John Wiley &Sons, Inc.Menard, S. (1995). Applied logistic regression analysis. Thousand Oaks, CA: Sage Publications.Pampel, F. C. (2000). Logistic regression: A primer. Thousand Oaks, CA: Sage Publications.[NOTE: Even though the Pampel book says it’s a primer, it’s rather highly mathematical; it is good for understanding odds ratios andprobabilities. Menard uses SPSS and Unlocking...