PSU STAT 504 - Modeling Longitudinal Data with GEE - D408095

Home> Schools> Penn State University> Statistics (STAT) > STAT 504> Modeling Longitudinal Data with GEE

PSU STAT 504 - Modeling Longitudinal Data with GEE

Course Stat 504- Analysis of Discrete Data

Pages 5

Download Save

Unformatted text preview:

Stat 504, Lecture 26 1✬✫✩✪Modeling LongitudinalData with GEEExample. Data analyzed by Hedeker and Gibbons(1997). A randomized trial for schizophrenia• 312 patients received drug therapy; 101 receivedplacebo• measurements at weeks 0, 1, 3, 6, but somesubjects have missing data due to dropout• outcome: severity of illness (1=normal, ...,7=extremely ill)“Spaghetti plot” of response curves for all subjects012345602468weeksevStat 504, Lecture 26 2✬✫✩✪Responses for drug patients:012345602468weeksevResponses for placebo patients:012345602468weeksevStat 504, Lecture 26 3✬✫✩✪Average for each group at each time point:012345602468weeksevplacebodrugSame plot versus square-root of week:0.0 0.5 1.0 1.5 2.0 2.502468sqrt(week)sevStat 504, Lecture 26 4✬✫✩✪As shown by the second plot, the average trajectoriesfor the placebo and drug groups appear to beapproximately linear when plotted against the squareroot of week.At baseline (week 0), the two groups have very similaraverages. This makes sense. In a randomized trial,the groups are initially just a random division of thesubjects; there should be no “treatment” eﬀectbecause the treatment hasn’t yet started. If therewere a diﬀerence at baseline, it would lead us tobelieve that the randomization was not carried outproperly.Let’s ﬁt a model for mean response with• an intercept,• a main eﬀect for group,• a main eﬀect for√week, and• an interaction between group and√week.This allows the two groups to have diﬀerent interceptsand slopes. Because the intercepts are deﬁned as theaverage responses at week 0, we expect that the maineﬀect for group (i.e. the diﬀerence in intercepts will besmall.Stat 504, Lecture 26 5✬✫✩✪How can we ﬁt this model, taking into account thefact that the multiple observations for a subject arecorrelated?Generalized Estimating Equations (GEE). Firstintroduced by Liang and Zeger (1986); see also Diggle,Liang and Zeger, (1994). Instead of attempting tomodel the within-subject covariance structure, treat itas a nuisance and simply model the mean response.In this framework, the covariance structure doesn’tneed to be speciﬁed correctly for us to get reasonableregression coeﬃcients and standard errors.First we examine the method of “independenceestimating equations,” which incorrectly assumes thatthe observations within a subject are independent.Stat 504, Lecture 26 6✬✫✩✪Independence estimating equations (IEE)The data for a single subject i measured at occasionsj =1, 2,...,ni:yi=(yi1,yi2,...,yi,ni)Tyij= discrete or continuous responseXi= ni× p matrix of covariatesLet us suppose that the mean responseE(yij)=µijis related to the covariates by a link function,g(µi)=],Xiβ,and let ∆ibe the diagonal matrix of variances∆i= Diag[ Var(yij)].Unless the observations within a subject areindependent,∆i=Cov(yi)But if ∆iwere correct, we could stack up the yi’s andestimate β using techniques for generalized linearmodels.How bad is it to pretend that ∆iis correct?Stat 504, Lecture 26 7✬✫✩✪Letˆβ be the estimate that assumes observationswithin a subject are independent (from ordinarylinear regression, logistic regression, etc.)• If ∆iis correct, thenˆβ is asymptotically unbiasedand eﬃcient• If ∆iis not correct, thenˆβ is still asymptoticallyunbiased but no longer eﬃcient– The ‘naive’ standard error forˆβ, obtainedfrom the naive estimate of Cov(ˆβ)ˆσ2XTˆWX−1,may be very misleading (here, X is the matrixof stacked Xi’s andˆW is the diagonal matrixof ﬁnal weights, if any)– consistent standard errors forˆβ are stillpossible using the sandwich estimator(sometimes called the ‘robust’ or ‘empirical’estimator)Stat 504, Lecture 26 8✬✫✩✪Sandwich estimatorThe sandwich estimator was ﬁrst proposed by Huber(1967) and White (1980); Liang and Zeger (1986)applied it to longitudinal dataXTˆWX−1 iXTi(yi− ˆµi)(yi− ˆµi)TXiXTˆWX−1• provides a good estimate of Cov(ˆβ)inlargesamples (several hundred subjects or more)regardless of the true form of Cov(yi)• in smaller samples it could be rather noisy, so95% intervals obtained by ±2 SE’s could suﬀerfrom undercoverageWhen within-subject correlations are not strong,Zeger (1988) suggests that the use of IEE with thesandwich estimator is highly eﬃcientStat 504, Lecture 26 9✬✫✩✪Example. The data from the schizophrenia trial.Column 1: subject IDColumn 2: group (0=placebo, 1=drug)Column 3: week (0, 1, 3, 6)Column 4: severity of illness (1,. . . ,7)1103 1 0 5.51103 1 1 3.01103 1 3 2.51103 1 6 4.01104 1 0 6.01104 1 1 3.01104 1 3 1.51104 1 6 2.5- lines omitted -9315 1 0 6.09315 1 1 6.09315 1 3 5.09315 1 6 5.59316 0 0 5.59316 0 1 6.09316 0 3 6.59316 0 6 6.0Stat 504, Lecture 26 10✬✫✩✪SAS program for ﬁtting model by independenceestimating equations:options nocenter nodate nonumber linesize=72;data schiz;infile "d:\jls\stat504\lectures\schiz.dat";input id drug week y;sqrtweek = sqrt(week);run;proc genmod data=schiz;class id;model y = drug sqrtweek drug*sqrtweek;repeated subject=id / type=ind modelse;run;What’s going on?• The model statement without any additionaloptions tells SAS to apply a normal errordistribution with constant variance. The defaultlink function for the normal model is the identitylink.• The repeated statement tells PROC GENMODto ﬁt the GEE with an independence correlationstructure (type=ind). The observations aregrouped by the class variable subject. Theoption modelse tells SAS to print outmodel-based SE’s along with those from thesandwich.Stat 504, Lecture 26 11✬✫✩✪Together, these two statements specify an estimationprocedure equivalent to ML under an ordinary linearregression model; in other words, the resultingestimates are simply OLS. What is the advantage ofusing PROC GENMOD here? The advantage is thatthe standard errors are computed using the sandwich;if the sample is suﬃciently large, then the SE’s aregoing to be reasonable even if the assumptions ofindependence and constant variance are wrong.Let’s look at some results.The GENMOD ProcedureModel InformationData Set WORK.SCHIZDistribution NormalLink Function IdentityDependent Variable yObservations Used 1500Missing Values 152Class Level InformationClass Levels Valuesid 413 1103 1104 1105 1106 1107 1108 1109 1110 1111 11131114 1115 1118 1124 1129 1136 1140 1301 1302 13031304 1305 1306 1307

View Full Document


School:
Email:
New Password:
Confirm Password:

PSU STAT 504 - Modeling Longitudinal Data with GEE

Sign up for free to view:

Please select your school