PSU STAT 504 - Modeling Longitudinal Data with GEE

Unformatted text preview:

Stat 504, Lecture 26 1✬✫✩✪Modeling LongitudinalData with GEEExample. Data analyzed by Hedeker and Gibbons(1997). A randomized trial for schizophrenia• 312 patients received drug therapy; 101 receivedplacebo• measurements at weeks 0, 1, 3, 6, but somesubjects have missing data due to dropout• outcome: severity of illness (1=normal, ...,7=extremely ill)“Spaghetti plot” of response curves for all subjects012345602468weeksevStat 504, Lecture 26 2✬✫✩✪Responses for drug patients:012345602468weeksevResponses for placebo patients:012345602468weeksevStat 504, Lecture 26 3✬✫✩✪Average for each group at each time point:012345602468weeksevplacebodrugSame plot versus square-root of week:0.0 0.5 1.0 1.5 2.0 2.502468sqrt(week)sevStat 504, Lecture 26 4✬✫✩✪As shown by the second plot, the average trajectoriesfor the placebo and drug groups appear to beapproximately linear when plotted against the squareroot of week.At baseline (week 0), the two groups have very similaraverages. This makes sense. In a randomized trial,the groups are initially just a random division of thesubjects; there should be no “treatment” effectbecause the treatment hasn’t yet started. If therewere a difference at baseline, it would lead us tobelieve that the randomization was not carried outproperly.Let’s fit a model for mean response with• an intercept,• a main effect for group,• a main effect for√week, and• an interaction between group and√week.This allows the two groups to have different interceptsand slopes. Because the intercepts are defined as theaverage responses at week 0, we expect that the maineffect for group (i.e. the difference in intercepts will besmall.Stat 504, Lecture 26 5✬✫✩✪How can we fit this model, taking into account thefact that the multiple observations for a subject arecorrelated?Generalized Estimating Equations (GEE). Firstintroduced by Liang and Zeger (1986); see also Diggle,Liang and Zeger, (1994). Instead of attempting tomodel the within-subject covariance structure, treat itas a nuisance and simply model the mean response.In this framework, the covariance structure doesn’tneed to be specified correctly for us to get reasonableregression coefficients and standard errors.First we examine the method of “independenceestimating equations,” which incorrectly assumes thatthe observations within a subject are independent.Stat 504, Lecture 26 6✬✫✩✪Independence estimating equations (IEE)The data for a single subject i measured at occasionsj =1, 2,...,ni:yi=(yi1,yi2,...,yi,ni)Tyij= discrete or continuous responseXi= ni× p matrix of covariatesLet us suppose that the mean responseE(yij)=µijis related to the covariates by a link function,g(µi)=],Xiβ,and let ∆ibe the diagonal matrix of variances∆i= Diag[ Var(yij)].Unless the observations within a subject areindependent,∆i=Cov(yi)But if ∆iwere correct, we could stack up the yi’s andestimate β using techniques for generalized linearmodels.How bad is it to pretend that ∆iis correct?Stat 504, Lecture 26 7✬✫✩✪Letˆβ be the estimate that assumes observationswithin a subject are independent (from ordinarylinear regression, logistic regression, etc.)• If ∆iis correct, thenˆβ is asymptotically unbiasedand efficient• If ∆iis not correct, thenˆβ is still asymptoticallyunbiased but no longer efficient– The ‘naive’ standard error forˆβ, obtainedfrom the naive estimate of Cov(ˆβ)ˆσ2XTˆWX−1,may be very misleading (here, X is the matrixof stacked Xi’s andˆW is the diagonal matrixof final weights, if any)– consistent standard errors forˆβ are stillpossible using the sandwich estimator(sometimes called the ‘robust’ or ‘empirical’estimator)Stat 504, Lecture 26 8✬✫✩✪Sandwich estimatorThe sandwich estimator was first proposed by Huber(1967) and White (1980); Liang and Zeger (1986)applied it to longitudinal dataXTˆWX−1 iXTi(yi− ˆµi)(yi− ˆµi)TXiXTˆWX−1• provides a good estimate of Cov(ˆβ)inlargesamples (several hundred subjects or more)regardless of the true form of Cov(yi)• in smaller samples it could be rather noisy, so95% intervals obtained by ±2 SE’s could sufferfrom undercoverageWhen within-subject correlations are not strong,Zeger (1988) suggests that the use of IEE with thesandwich estimator is highly efficientStat 504, Lecture 26 9✬✫✩✪Example. The data from the schizophrenia trial.Column 1: subject IDColumn 2: group (0=placebo, 1=drug)Column 3: week (0, 1, 3, 6)Column 4: severity of illness (1,. . . ,7)1103 1 0 5.51103 1 1 3.01103 1 3 2.51103 1 6 4.01104 1 0 6.01104 1 1 3.01104 1 3 1.51104 1 6 2.5- lines omitted -9315 1 0 6.09315 1 1 6.09315 1 3 5.09315 1 6 5.59316 0 0 5.59316 0 1 6.09316 0 3 6.59316 0 6 6.0Stat 504, Lecture 26 10✬✫✩✪SAS program for fitting model by independenceestimating equations:options nocenter nodate nonumber linesize=72;data schiz;infile "d:\jls\stat504\lectures\schiz.dat";input id drug week y;sqrtweek = sqrt(week);run;proc genmod data=schiz;class id;model y = drug sqrtweek drug*sqrtweek;repeated subject=id / type=ind modelse;run;What’s going on?• The model statement without any additionaloptions tells SAS to apply a normal errordistribution with constant variance. The defaultlink function for the normal model is the identitylink.• The repeated statement tells PROC GENMODto fit the GEE with an independence correlationstructure (type=ind). The observations aregrouped by the class variable subject. Theoption modelse tells SAS to print outmodel-based SE’s along with those from thesandwich.Stat 504, Lecture 26 11✬✫✩✪Together, these two statements specify an estimationprocedure equivalent to ML under an ordinary linearregression model; in other words, the resultingestimates are simply OLS. What is the advantage ofusing PROC GENMOD here? The advantage is thatthe standard errors are computed using the sandwich;if the sample is sufficiently large, then the SE’s aregoing to be reasonable even if the assumptions ofindependence and constant variance are wrong.Let’s look at some results.The GENMOD ProcedureModel InformationData Set WORK.SCHIZDistribution NormalLink Function IdentityDependent Variable yObservations Used 1500Missing Values 152Class Level InformationClass Levels Valuesid 413 1103 1104 1105 1106 1107 1108 1109 1110 1111 11131114 1115 1118 1124 1129 1136 1140 1301 1302 13031304 1305 1306 1307


View Full Document

PSU STAT 504 - Modeling Longitudinal Data with GEE

Download Modeling Longitudinal Data with GEE
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Modeling Longitudinal Data with GEE and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Modeling Longitudinal Data with GEE 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?