Unformatted text preview:

EXST7034 Regression Techniques Geaghan Logistic regression Diagnostics Page 1 1 **********************************************************; 2 *** Logistic Regression - Disease outbreak example ***; 3 *** NKNW table 14.3 (Appendix C3) ***; 4 *** Study of a disease outbreak from a mosquito born ***; 5 *** disease within two sectors of a city. ***; 6 **********************************************************; 7 8 dm'log;clear;output;clear'; 9 options nodate nocenter nonumber ps=512 ls=132 nolabel; 10 ODS HTML style=minimal rs=none body='C:\Geaghan\Current\EXST7034\Fall2005\SAS\DiseaseOutbreak01.html' ; NOTE: Writing HTML Body file: C:\Geaghan\Current\EXST7034\Fall2005\SAS\DiseaseOutbreak01.html 11 12 TITLE1 'Logistic Regression - NKNW Example 14.3'; 13 data Disease; infile cards missover; 14 input case Age Status1 Status2 sector Disease; 15 label case = 'case number' 16 age = 'Patients age' 17 status = 'Socioeconomic status upper, middle and lower' 18 disease = 'Disease present = 1'; 19 * Status classes are upper (0, 0), Middle (1, 0) and Lower (0, 1); 20 Cards; NOTE: Variable status is uninitialized. NOTE: The data set WORK.DISEASE has 98 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.03 seconds 119 ; 120 ods html; 121 ods graphics on; NOTE: ODS Statistical Graphics will require a SAS/GRAPH license when it is declared production. 122 123 proc logistic data=Disease DESCENDING alpha=0.05; 124 TITLE2 'Logistic regression on Disease data'; 125 model Disease = Age Status1 Status2 Sector / lackfit RSQ iplots; 126 output out=next1 PREDICTED=yhat Lower=lcl95 Upper=ucl95 dfbetas=_ALL_ 127 resdev=resdev difdev=difdev; 128 run; NOTE: PROC LOGISTIC is modeling the probability that Disease=1. NOTE: Convergence criterion (GCONV=1E-8) satisfied. WARNING: Statistical graphics displays created with ODS are experimental in this release. NOTE: There were 98 observations read from the data set WORK.DISEASE. NOTE: The data set WORK.NEXT1 has 98 observations and 17 variables. NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted by the "BEST" format. NOTE: The PROCEDURE LOGISTIC printed page 1. NOTE: PROCEDURE LOGISTIC used (Total process time): real time 4.77 seconds cpu time 3.12 seconds Logistic Regression - NKNW Example 14.3 Logistic regression on Disease data The LOGISTIC Procedure Model Information Data Set WORK.DISEASE Response Variable Disease Number of Response Levels 2 Model binary logit Optimization Technique Fisher's scoring Number of Observations Read 98 Number of Observations Used 98EXST7034 Regression Techniques Geaghan Logistic regression Diagnostics Page 2 Response Profile Ordered Total Value Disease Frequency 1 1 31 2 0 67 Probability modeled is Disease=1. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 124.318 111.054 SC 126.903 123.979 -2 Log L 122.318 101.054 R-Square 0.1950 Max-rescaled R-Square 0.2736 Model fit statistics 1) Akaike Information Criterion AIC = 2log( ) 2Lp−+ where Log(L) is the log likelihood and p is the number of parameters 2) Schwarz Criterion SC = 2 log( ) log( )jjLp f−+∑ 3) -2log L []1ˆˆ2 log()(1 )log(1 )niei i e iiYYππ−−+−−∑ This is analogous to the SSE in regression and is given in SAS as the “-2 Log L”. Two models (full and reduced) can be compared by calculating the difference in “-2 Log L” for both models. This difference follows a chi square distribution with a d.f. equal to the difference in d.f. for the two models. 4) Generalized R2 2(0)1()nLLθ⎛⎞−⎜⎟⎝⎠, where L(0) is the intercept only model. Since this value reaches its maximum of less than 1 for discrete models an adjustment has been proposed. This is called the Max-rescaled Rsquare in SAS. 22maxRR Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 21.2635 4 0.0003 Score 20.4067 4 0.0004 Wald 16.6437 4 0.0023 Wald : used to test individual parameter estimates and to place confidence intervals. It is based on a large sample assumption of asymptotic normality. Chi-square Test 22/() = [/ ()]iii iVar Stderrβββ β Confidence interval ()ˆˆiiiiˆˆ(1.96) (1.96)e e 0.95SSPiβββββ−−≤≤ =EXST7034 Regression Techniques Geaghan Logistic regression Diagnostics Page 3 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.3127 0.6426 12.9545 0.0003 Age 1 0.0297 0.0135 4.8535 0.0276 Status1 1 0.4088 0.5990 0.4657 0.4950 Status2 1 -0.3051 0.6041 0.2551 0.6135 sector 1 1.5746 0.5016 9.8543 0.0017 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits Age 1.030 1.003 1.058 Status1 1.505 0.465 4.868 Status2 0.737 0.226 2.408 sector 4.829 1.807 12.907 Association of Predicted Probabilities and Observed Responses Percent Concordant 77.5 Somers' D 0.554 Percent Discordant 22.1 Gamma 0.556 Percent Tied 0.3 Tau-a 0.242 Pairs 2077 c 0.777 Association of Predicted


View Full Document
Download Regression Techniques
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Regression Techniques and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regression Techniques 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?