Penn EPID 521 - Regression Methods for binary outcomes - D868340

Home> Schools> University of Pennsylvania> Epidemiology (EPID) > EPID 521> Regression Methods for binary outcomes

Penn EPID 521 - Regression Methods for binary outcomes

Course Epid 521- Statistical Methods for Epidemiologic Research

Pages 20

Download Save

Unformatted text preview:

EP 521 Spring 2007, Vol II, Part 2 Copyright © 2006, Trustees of the University of Pennsylvania 1Regression Methods for binary outcomes (logistic regression) §4.1 Background §4.2 Logistic regression – properties of the model §4.3 Logistic regression – Use of the model §4.4 Likelihoods and likelihood ratios §4.5 Likelihood ratios: Background and Common Uses in Epidemiology – [Advanced Material] EP 521 Spring 2006, Vol II, Part 2 2 §4.1 Background This section deals with the derivation of the next regression model we use. This model is another “generalized linear model” as is linear or ordinary least squares regression. Because the outcome is binary, the model is more complex to describe and requires a more advanced algorithm to solve. Logistic regression is common in epidemiology because it is the principal regression model for the analysis of binary outcome data This section is somewhat complex and technical, with a great deal of mathematical notation. It introduces the next Part (part 3) which covers the application of this model using software (Stata) which handles the complex math. But to understand what statistical software is doing, and of more importance, to understand how to interpret the results, this technical chapter is necessary.EP 521 Spring 2007, Vol II, Part 2 Copyright © 2006, Trustees of the University of Pennsylvania 3 §4.2 Logistic regression — properties of the model 1. Binary outcome 2. Continuous or discrete predictors 3. Outcome related to predictors by a “link” (another word for an equation) 4. Parameter estimates (for predictors) can be tested for statistical significance 5. Two or more parameters can be tested at once 6. Applicable to both prospective or retrospective data 7. Coefficients (of predictors) have easy interpretation ˆˆln(), i.e., BBOReOR== 8. Easy to compare adjusted and unadjusted OR s (A adjusted for B and B adjusted for A in one model) 9. Easy to do formal tests of interactions 10. Predicted probabilities, Sn, Sp, ROC areas available in standard software packages (STATA, SAS, Splus) EP 521 Spring 2006, Vol II, Part 2 4 Comparisons with ordinary least squares regression (OLS) Similarities – OLS and logistic Regression coefficients (ˆ'sβ) are adjusted for effect of other predictors in the model Parameters are partial regression coefficients, i.e., they reflect the partial effect of one predictor variable when the other predictors included in the model are held constant Both models can rely on a least squares fitting algorithm Both are “generalized linear models” Differences OLS (linear) regression applies to continuous outcomes Logistic applies to binary outcomes (dichotomous) OLS uses the identity link ()Eyx=α+β Logistic uses the logit link ˆlogit()yx=α+β Logistic regression must be fit iteratively using weighted least squares algorithmEP 521 Spring 2007, Vol II, Part 2 Copyright © 2006, Trustees of the University of Pennsylvania 5Why logistic regression? The logistic function takes the form: 1()11zefzzzee==−++ It translates a measure (z) that ranges from ,−∞+∞ into one (f(z)) that is bounded by [0.1]. f(z)z-5 0 50.51 EP 521 Spring 2006, Vol II, Part 2 6 111()0()11111()1()11011111(0)(0)1121feefeefe−∞====−−∞∞∞++∞====−∞+++∞===−++ (Note: 01a=) So, 0 ≤f(z) ≤1 (like proportions or risks) while z−∞≤≤+∞ So, predicted values for f(z) always fall between 0 and 1 (not true for other models, while the range of z is infinite.) Note from the figure, z moves to +−∞∞, f(z) increases: slowly at first, then rapidly, then slowly again.EP 521 Spring 2007, Vol II, Part 2 Copyright © 2006, Trustees of the University of Pennsylvania 7“Risk” (f(z)) is minimal until some threshold is reached, then rises rapidly over intermediate values, then remains high. This shape might apply to a number of disease conditions EP 521 Spring 2006, Vol II, Part 2 8 §4.3 Use of the logistic model – a somewhat more formal presentation A linear model takes the form: 01122...kkzxxx=β+β+β++β (Eq 4.1) where there are k predictors in the model, and z has the range ,−∞+∞ This notation can be simplified by representing all of the products of b’s and x’s by “matrix” notationXβ. So, we can write: zX=β This notation you will see often in discussion of multivariable models.EP 521 Spring 2007, Vol II, Part 2 Copyright © 2006, Trustees of the University of Pennsylvania 9As we have seen, the logistic function has the form: 1()111zXzzXeefzeeeβ−β===+++, (Eq 4.2) Because zX=Β We use the logistic “link” for logistic regression. One can describe the probability of disease developing in an initially disease-free person within a defined period (e.g. 6 months) with a set of covariates X, where X represents x1, x2, ..., xk, as follows: Pr(1|), 1XXeDXxeββ===+ (Eq 4.3) where D=1 means getting the disease, and β represents the “vector” (or group) of parameters that we must estimate. The notation X=x, means that the predictors take on particular values denoted by lower case x. There is one b for each x contained in the matrix X. In a simple case with only one predictor (exposure, for example), we have the EP 521 Spring 2006, Vol II, Part 2 10 notation:0111011Pr(1|), 1bbxbbxeDXxe++===+ (Eq 4.4) With 3 predictors, this would be ()( )12301231230123exp+++bbbbxxx1+exp+++bbbbxxx (Eq 4.5) There are two other forms of notation for this model of the probability of an event (D). First, there is a shorthand: 0111011Pr(), 1bbxbbxexe++=+ (Eq 4.6) Or, in the general matrix notation, there is: Pr(), 1XXeXeΒΒ=+ (Eq 4.7) Second, there is the simplified notation that has been recommended by Rothman and Greenland. 011011011expit(), 1bbxbbxebbxe+++=+ (Eq 4.8)EP 521 Spring 2007, Vol II, Part 2 Copyright © 2006, Trustees of the University of Pennsylvania 11Another simplification is 0111011011()1Pr()11bbxbbxbbxexee++−+==++ These notations all represent expected probabilities E(D) of an outcome D from the logistic model. To estimate probabilities (risks) from the parameter estimates obtained from the logistic model, we need: (1) a followup study (prospective cohort), and (2) specific values for x1, x2, ..., xk For prospective studies, to estimate relative risk (RR) , we need to estimate the baseline (references group) risk

View Full Document


School:
Email:
New Password:
Confirm Password:

Penn EPID 521 - Regression Methods for binary outcomes

Sign up for free to view:

Please select your school