REGRESSION ON AN INDICATOR VARIABLE In this technique, the dependent variable (Y) is an indicator, and takes avalue of either 0 or 1. This is called a binary response variableIndependent Variable(s)Response01Examples any two categories, any binomial or binary variable a) Success-failure, Gender (Male-female), mortality, presence-absence,pass-fail, etc. The results of a simple linear regression is a slope and intercept which willproduce a predicted value which ranges from 0 to 1 over most of therange of X This b can be interpreted as a probability of obtaining a 1 per unit of X, and"the predicted value is the probability of obtaining a 1 at some particularvalue of X.Problems with regression on indicator variables 1) Nonnormal errors : given that = Y X , then%""33!"3 When Y = 1, then = 1 X33!"3%"" When Y = 0, then = X33!"3%"" 2) Nonconstant errors Let P(Y =1) = and P(Y =0) = 133 3 311 then E(Y ) = 1( ) + 0(1 ) = = X33 33!"3111"" and = E[Y E(Y )] = (1 ) + (0 ) (1 )511112222]33 33 333 = (1 ) = E(Y )(1 E(Y ))1133 3 3 finally, Var( ) = Var(Y ), since = Y and is a constant%%11333333 so = (1 )51 12%333 = E(Y )(1 E(Y )) = ( X )(1 X )33!"3!"3"" "" and the variance is a function of X3 3) Constraints on the response function If the function is fitted with a line, at some point the predicted value will be<0 or >1. As a probability, the true value must be between 0 and 1, so wemust place some restraint on the predicted value.So we would like to find a function which solves some of these problems, wemight also expect a curve instead of a simple linear and we would like acurve that can go from 0 to 1 (asymptotically)Several sigmoid possibilities have been considered, especially a) Logistic (symmetric) b) cumulative normal distribution (Probit analysis) This version of the logistic has several advantages, E(Y) = exp( + X ) 1+exp( + X ) """"!"3!"3 particularly that it can be readily linearized by the transformation = log 1w/‘111 This is called a LOGIT transformation, and is called a logit mean1wresponse. We can then fit = b + b X1w3!"3 and we should closely approximate the logistic.The logistic can also be fitted directly with nonlinear techniques.A similar, but more difficult and less flexible, transformation exists for thecumulative normal distribution, and is called a PROBIT transformationWeighting to improve variance : the logit only linearizes the logistic function, itdoes not cure the nonhomogeneous variance problem The logit, = log 1w/‘111 is estimated by p = log w3/’“p1p33 The variance of p is3 Var(p ) = w31 n(1 ) 33 311 which is estimated by, s = p1 np(1 p) w333 3 we could therefore weight by w = np(1 p)333 3in order to homogenize the variance.Notes: 1) logits are readily extendible to multiple regression. 2) Logistic regression has many applications. One common application inthe biological sciences is the calculation of the dose needed to causemortality. However, small doses cause small mortalities and large dosescause large mortalities. We therefore calculate an LD , which is the&!“lethal dose for 50% mortality". for example, given the equation below = b + b X = -2.64 + 0.673*dose^1w3!"3 the LD is given by50 = log = 0^1w&!/‘50150 0 = -2.64 + 0.673*dose&! dose = = 3.923, or a dose of about
View Full Document