DOC PREVIEW
UW-Madison STAT 572 - Logistic Regression

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The Big PictureGeneralized Linear ModelsModel ComponentsLink FunctionsDevianceLogistic RegressionExampleBinomial DistributionLogistic RegressionBret LargetDepartments of Botany and of StatisticsUniversity of Wisconsin—MadisonFebruary 14, 20081 / 15The Big PictureIn all of the linear models we have seen so far this semester, theresponse variable has been modeled as a normal random variable.(response) = (fixed parameters) + (normal random effects and error)For many data sets, this model is inadequate.For example, if the response variable is categorical with two possibleresponses, it makes no sense to model the outcome as normal.Also, if the response is always a small positive integer, its distributionis also not well described by a normal distribution.Generalized linear models (GLMs) are an extension of linear models tomodel non-normal response variables.We will study logistic regression for binary response variables andadditional models in Chapter 6.The Big Picture 2 / 15Generalized Linear ModelsA standard linear model has the following form:y = β11 + β2x2+ · · · + βkxk+ e, ei∼ N(0, σ2)The mean of expected value of the response is written this way.E[y] = β11 + β2x2+ · · · + βkxkWe will use the notation η = β11 + β1x1+ · · · + βkxkto represent thelinear combination of explanatory variables.In a standard linear model, E[y ] = η.In a GLM, there is a link function g between η and the mean of theresponse variable.g(E[y]) = ηFor standard linear models, the link function is the identity functiong(y) = y.Generalized Linear Models Model Components 3 / 15Link FunctionsIt is usually more clear to consider the inverse of the link function.E[y] = g−1(η)The mean of a distribution is usually either a parameter of adistribution or is a function of parameters of a distribution, which iswhat the this inverse function shows.When the response variable is binary (with values coded as 0 or 1),the mean is simply E[y] = P {y = 1}.A useful function for this case isE[y] = P {y = 1} =eη1 + eηNotice that the parameter is always between 0 and 1.The corresponding link function is called the logit function,g(x) = log(x/(1 − x)) and regression under this model is calledlogistic regression.Generalized Linear Models Link Functions 4 / 15DevianceIn standard linear models, we estimate the parameters by minimizingthe sum of the squared residuals.This is equivalent to finding parameters that maximize the likelihood.In a GLM we also fit parameters by maximizing the likelihood.The deviance is equal to twice the log likelihood up to an additiveconstant.Estimation is equivalent to finding parameter values that minimize thedeviance.Generalized Linear Models Deviance 5 / 15Logistic RegressionLogistic regression is a natural choice when the response variable iscategorical with two possible outcomes.Pick one outcome to be a “success”, where y = 1.We desire a model to estimate the probability of “success” as afunction of the explanatory variables.Using the inverse logit function, the probability of success has theformP {y = 1} =eη1 + eηWe estimate the parameters so that this probability is high for caseswhere y = 1 and low for cases where y = 0.Logistic Regression 6 / 15ExampleIn surgery, it is desirable to give enough anesthetic so that patients donot move when an incision is made.It is also desirable not to use much more anesthetic than necessary.In an experiment, patients are given different concentrations ofanesthetic.The response variable is whether or not they move at the time ofincision 15 minutes after receiving the drug.Logistic Regression Example 7 / 15DataConcentration0.8 1.0 1.2 1.4 1.6 2.5Move 6 4 2 2 0 0No move 1 1 4 4 4 2Total 7 5 6 6 4 2Proportion 0.17 0.20 0.67 0.67 1.00 1.00Analyze in R with glm twice, once using raw data and once usingsummarized counts.Logistic Regression Example 8 / 15Binomial DistributionLogistic regression is related to the binomial distribution.If there are several observations with the same explanatory variablevalues, then the individual responses can be added up and the sumhas a binomial distribution.Recall for the binomial distribution that the parameters are n and pand the moments are µ = np and σ2= np(1 − p).The probability distribution isP(X = x) =nxpx(1 − p)n−xLogistic regression is in the “binomial family” of GLMs.Logistic Regression Binomial Distribution 9 / 15R with Raw Data> ane = read.table("anesthetic.txt", header = T)> str(ane)'data.frame': 30 obs. of 3 variables:$ movement: Factor w/ 2 levels "move","noMove": 2 1 2 1 1 2 2 1 2 1 ...$ conc : num 1 1.2 1.4 1.4 1.2 2.5 1.6 0.8 1.6 1.4 ...$ nomove : int 1 0 1 0 0 1 1 0 1 0 ...> aneRaw.glm = glm(nomove ~ conc, data = ane,+ family = binomial(link = "logit"))Logistic Regression Binomial Distribution 10 / 15R with Raw Data> library(arm)arm (Version 1.1-1, built: 2008-1-13)Working directory is /Users/bret/Desktop/s572/Spring2008/Notesoptions( digits = 2 )> display(aneRaw.glm, digits = 3)glm(formula = nomove ~ conc, family = binomial(link = "logit"),data = ane)coef.est coef.se(Intercept) -6.469 2.418conc 5.567 2.044---n = 30, k = 2residual deviance = 27.8, null deviance = 41.5 (difference = 13.7)Logistic Regression Binomial Distribution 11 / 15Fitted ModelThe fitted model is the following.η = −6.469 + 5.567 × (concentration)andP {No move} =eη1 + eηLogistic Regression Binomial Distribution 12 / 15Plot of Relationship1.0 1.5 2.0 2.50.00.20.40.60.81.0ConcentrationProbability(no move)●●● ●● ●Logistic Regression Binomial Distribution 13 / 15Second Analysis> noCounts = c(1, 1, 4, 4, 4, 2)> total = c(7, 5, 6, 6, 4, 2)> prop = noCounts/total> concLevels = c(0.8, 1, 1.2, 1.4, 1.6,+ 2.5)> ane2 = data.frame(noCounts, total, prop,+ concLevels)> aneTot.glm = glm(prop ~ concLevels, data = ane2,+ family = binomial, weights = total)Logistic Regression Binomial Distribution 14 / 15Second Analysis> display(aneTot.glm)glm(formula = prop ~ concLevels, family = binomial, data = ane2,weights = total)coef.est coef.se(Intercept) -6.47 2.42concLevels 5.57 2.04---n = 6, k = 2residual deviance = 1.7, null deviance = 15.4 (difference = 13.7)Logistic Regression Binomial Distribution 15 /


View Full Document

UW-Madison STAT 572 - Logistic Regression

Download Logistic Regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Logistic Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Logistic Regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?