EXST7034 Regression Techniques Geaghan Logistic regression Diagnostics Page 1 1 **********************************************************; 2 *** Logistic Regression - Disease outbreak example ***; 3 *** NKNW table 14.3 (Appendix C3) ***; 4 *** Study of a disease outbreak from a mosquito born ***; 5 *** disease within two sectors of a city. ***; 6 **********************************************************; 7 8 dm'log;clear;output;clear'; 9 options nodate nocenter nonumber ps=512 ls=132 nolabel; 10 ODS HTML style=minimal rs=none body='C:\Geaghan\Current\EXST7034\Fall2005\SAS\DiseaseOutbreak01.html' ; NOTE: Writing HTML Body file: C:\Geaghan\Current\EXST7034\Fall2005\SAS\DiseaseOutbreak01.html 11 12 TITLE1 'Logistic Regression - NKNW Example 14.3'; 13 data Disease; infile cards missover; 14 input case Age Status1 Status2 sector Disease; 15 * Status classes are upper (0, 0), Middle (1, 0) and Lower (0, 1); 16 status = 'Upper '; 17 if status1 eq 1 then status = 'Middle'; 18 if status2 eq 1 then status = 'Lower'; 19 label case = 'case number' 20 age = 'Patients age' 21 status = 'Socioeconomic status upper, middle and lower' 22 disease = 'Disease present = 1'; 23 Cards; NOTE: The data set WORK.DISEASE has 98 observations and 7 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.02 seconds 122 ; 123 124 proc logistic data=Disease DESCENDING alpha=0.05; 125 TITLE2 'Logistic regression on Disease data (with Status1 and Status2)'; 126 model Disease = Age Status1 Status2 Sector; 127 run; NOTE: PROC LOGISTIC is modeling the probability that Disease=1. NOTE: Convergence criterion (GCONV=1E-8) satisfied. NOTE: There were 98 observations read from the data set WORK.DISEASE. NOTE: The PROCEDURE LOGISTIC printed page 1. NOTE: PROCEDURE LOGISTIC used (Total process time): real time 0.25 seconds cpu time 0.04 seconds 128 129 proc logistic data=Disease DESCENDING alpha=0.05; 130 class status; 131 TITLE2 'Logistic regression on Disease data (with CLASS Status - default)'; 132 model Disease = Age Status Sector; 133 run; NOTE: PROC LOGISTIC is modeling the probability that Disease=1. NOTE: Convergence criterion (GCONV=1E-8) satisfied. NOTE: There were 98 observations read from the data set WORK.DISEASE. NOTE: The PROCEDURE LOGISTIC printed page 2. NOTE: PROCEDURE LOGISTIC used (Total process time): real time 0.19 seconds cpu time 0.03 seconds 134 135 proc logistic data=Disease DESCENDING alpha=0.05; 136 class status / param = glm; 137 TITLE2 'Logistic regression on Disease data (with CLASS Status - GLM)'; 138 model Disease = Age Status Sector / lackfit RSQ iplots; 139 output out=next1 PREDICTED=yhat Lower=lcl95 Upper=ucl95 dfbetas=_ALL_ 140 resdev=resdev difdev=difdev; 141 run; NOTE: PROC LOGISTIC is modeling the probability that Disease=1. NOTE: Convergence criterion (GCONV=1E-8) satisfied. NOTE: There were 98 observations read from the data set WORK.DISEASE. NOTE: The data set WORK.NEXT1 has 98 observations and 19 variables. NOTE: The PROCEDURE LOGISTIC printed pages 3-4.EXST7034 Regression Techniques Geaghan Logistic regression Diagnostics Page 2 NOTE: PROCEDURE LOGISTIC used (Total process time): real time 0.23 seconds cpu time 0.10 seconds The three PROC LOGISTIC analyses differ in only one regard, and all results are the same except the parameter estimates. There are 3 socio-economic levels and many different ways of setting up the dummy variables. Several are explored below. Make sure you know what you are estimating. • The first version uses the status dummy variables provided with the data set where two variables status1 and status2 have the following values for the 3 socio-economic levels: Upper (0, 0), Middle (1, 0) and Lower (0, 1). • I created a class variable STATUS with the three levels coded as “Lower, Middle, Upper”. In the second PROC LOGISTIC the STATUS variable was placed in the class statement. • In the third PROC LOGISTIC, for which a full analysis is provided, the option “/ param = glm;” was requested on the CLASS statement. Logistic Regression - NKNW Example 14.3 Logistic regression on Disease data (with Status1 and Status2) The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.3127 0.6426 12.9545 0.0003 Age 1 0.0297 0.0135 4.8535 0.0276 Status1 1 0.4088 0.5990 0.4657 0.4950 Status2 1 -0.3051 0.6041 0.2551 0.6135 sector 1 1.5746 0.5016 9.8543 0.0017 Odds Ratio Estimates Point 95% Wald Effect EstimateConfidence Limits Age 1.030 1.003 1.058 Status1 1.505 0.465 4.868 Status2 0.737 0.226 2.408 sector 4.829 1.807 12.907 Logistic Regression - NKNW Example 14.3 Logistic regression on Disease data (with CLASS Status - default) The LOGISTIC Procedure Class Level Information Design Class Value Variables status Lower 1 0 Middle 0 1 Upper -1 -1 Note the coding of the dummy variables, contrasting Lower and Middle to Upper.EXST7034 Regression Techniques Geaghan Logistic regression Diagnostics Page 3 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.2782 0.5195 19.2314 <.0001 Age 1 0.0297 0.0135 4.8535 0.0276 status Lower 1 -0.3397 0.3690 0.8471 0.3574 status Middle 1 0.3742 0.3662 1.0439 0.3069
View Full Document