LSU EXST 7034 - REGRESSION ON AN INDICATOR VARIABLE

Unformatted text preview:

REGRESSION ON AN INDICATOR VARIABLE  In this technique, the dependent variable (Y) is an indicator, and takes avalue of either 0 or 1. This is called a binary response variableIndependent Variable(s)Response01Examples any two categories, any binomial or binary variable a) Success-failure, Gender (Male-female), mortality, presence-absence,pass-fail, etc. The results of a simple linear regression is a slope and intercept which willproduce a predicted value which ranges from 0 to 1 over most of therange of X This b can be interpreted as a probability of obtaining a 1 per unit of X, and"the predicted value is the probability of obtaining a 1 at some particularvalue of X.Problems with regression on indicator variables 1) Nonnormal errors : given that = Y X , then%""33!"3 When Y = 1, then = 1 X33!"3%"" When Y = 0, then = X33!"3%"" 2) Nonconstant errors Let P(Y =1) = and P(Y =0) = 133 3 311 then E(Y ) = 1( ) + 0(1 ) = = X33 33!"3111"" and = E[Y E(Y )] = (1 ) + (0 ) (1 )511112222]33 33 333 = (1 ) = E(Y )(1 E(Y ))1133 3 3 finally, Var( ) = Var(Y ), since = Y and is a constant%%11333333 so = (1 )51 12%333 = E(Y )(1 E(Y )) = ( X )(1 X )33!"3!"3"" "" and the variance is a function of X3 3) Constraints on the response function If the function is fitted with a line, at some point the predicted value will be<0 or >1. As a probability, the true value must be between 0 and 1, so wemust place some restraint on the predicted value.So we would like to find a function which solves some of these problems, wemight also expect a curve instead of a simple linear and we would like acurve that can go from 0 to 1 (asymptotically)Several sigmoid possibilities have been considered, especially a) Logistic (symmetric) b) cumulative normal distribution (Probit analysis) This version of the logistic has several advantages, E(Y) = exp( + X ) 1+exp( + X ) """"!"3!"3 particularly that it can be readily linearized by the transformation = log 1w/‘111 This is called a LOGIT transformation, and is called a logit mean1wresponse. We can then fit = b + b X1w3!"3 and we should closely approximate the logistic.The logistic can also be fitted directly with nonlinear techniques.A similar, but more difficult and less flexible, transformation exists for thecumulative normal distribution, and is called a PROBIT transformationWeighting to improve variance : the logit only linearizes the logistic function, itdoes not cure the nonhomogeneous variance problem The logit, = log 1w/‘111 is estimated by p = log w3/’“p1p33 The variance of p is3 Var(p ) = w31 n(1 ) 33 311 which is estimated by, s = p1 np(1 p) w333 3 we could therefore weight by w = np(1 p)333 3in order to homogenize the variance.Notes: 1) logits are readily extendible to multiple regression. 2) Logistic regression has many applications. One common application inthe biological sciences is the calculation of the dose needed to causemortality. However, small doses cause small mortalities and large dosescause large mortalities. We therefore calculate an LD , which is the&!“lethal dose for 50% mortality". for example, given the equation below = b + b X = -2.64 + 0.673*dose^1w3!"3 the LD is given by50 = log = 0^1w&!/‘50150 0 = -2.64 + 0.673*dose&! dose = = 3.923, or a dose of about


View Full Document

LSU EXST 7034 - REGRESSION ON AN INDICATOR VARIABLE

Download REGRESSION ON AN INDICATOR VARIABLE
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view REGRESSION ON AN INDICATOR VARIABLE and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view REGRESSION ON AN INDICATOR VARIABLE 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?