PSU STAT 544 - Quasilikelihood and GEE - D2770696

Home> Schools> Penn State University> Statistics (STAT) > STAT 544> Quasilikelihood and GEE

DOC PREVIEW

PSU STAT 544 - Quasilikelihood and GEE

School name Penn State University

Course Stat 544- Categorical Data

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Stat 544, Lecture 22 1'&$%Quasilikelihoodand GEEQuasilikelihood. Suppose that we observe responsesy1,y2,...,yNwhich we want to relate to covariates.The ordinary linear regression model isyi∼ N(xTiβ,σ2),where xiis a vector of covariates, β is a vector ofcoeﬃcients to be estimated, and σ2is the errorvariance. Now let’s generalize this model in two ways:• Introduce a link function for the meanE(yi)=μi,g(μi)=xTiβ.We could also write this asE(yi)=μi(β), (1)where the covariates and the link become part ofthe function μi. For example, in a loglinearmodel we would have μi=exp(xTiβ).Stat 544, Lecture 22 2'&$%• Allow for heteroscedasticity, so thatVar(yi)=Vi. (2)In many cases Viwill depend on the mean μiandtherefore on β, and possibly on additionalunknown parameters (e.g. a scale factor). Forexample, in a traditional overdispersed loglinearmodel, we would have Vi= σ2μi= σ2exp(xTiβ).A maximum-likelihood estimate for β under thismodel could be computed by a Fisher scoringprocedure. ML estimates have two nice theoreticalproperties: they are approximately unbiased andhighly eﬃcient. Interestingly, the asymptotic theoryunderlying these properties does not really depend onthe normality of yi, but only on the ﬁrst twomoments. That is, if the mean function (1) and thevariance function (2) are correct, but the distributionof yiis not normal, the estimate of β obtained bymaximizing the normal loglikelihood fromyi∼ N( μi(β),Vi(β))is still asymptotically unbiased and eﬃcient.Stat 544, Lecture 22 3'&$%Quasi-scoring. If we maximize the normality-basedloglikelihood without assuming that the response isnormally distributed, the resulting estimate of β iscalled a quasilikelihood estimate. The iterativeprocedure for computing the quasilikelihood estimate,called quasi-scoring, proceeds as follows.First, let’s collect the responses and their means intovectors of length N,y =26666664y1y2...yN37777775,μ=26666664μ1μ2...μN37777775.Also, let V be the N × N matrix with V1,...,VNonthe diagonal and zeros elsewhere. Finally, letD =∂μ∂βbe the N × p matrix containing the partial derivativesof μ with respect to β. That is, the (i, j)th element ofD is ∂μi/∂βj. In a simple linear model with anidentity link, μi= xTiβ, D is simply the N × p matrixStat 544, Lecture 22 4'&$%of covariatesX =26666664xT1xT2...xTN37777775.The iterative quasi-scoring procedure isβnew− βold=“DTV−1D”−1DTV−1(y − μ), (3)where on the right-hand side of (3) the quantities D,˜V , and μ are evaluated at β = βold.It is easy to see that in the case of simple linearregression with homoscedastic response (μi= xTiβand Vi= σ2), the quasiscoring procedure converges tothe OLS estimateˆβ =(XTX)−1XTyin a single step regardless of βold. In other cases (e.g.loglinear regression where μi=exp(xTiβ),Vi= σ2μi)it produces the same estimate for β that we obtainedearlier by ﬁtting the generalized linear model.Quasilikelihood estimation is really the same thing asgeneralized linear modeling, except that we no longerStat 544, Lecture 22 5'&$%have a full parametric model for yi.If we let U be the p × 1 vectorU = DTV−1(y − μ),the ﬁnal estimate from the quasi-scoring proceduresatisﬁes the condition U =0. U can be regarded asthe ﬁrst derivative of the quasi-loglikelihood function.The ﬁrst derivative of a loglikelihood is called a score,so U is called a quasi-score. U = 0 is often called theset of estimating equations, and the ﬁnal estimate forβ is the solution to the estimating equations.Estimating equations with a working variancefunction. We’ll suppose that the mean regressionfunction μi(β) has been correctly speciﬁed but thevariance function has not. That is, the data analystincorrectly supposes that the variance function for yiis˜Virather than Vi, where˜Viis another function of β.The analyst then estimates β by solving thequasi-score equationsU = DT˜V−1(y − μ)=0where˜V is the diagonal matrix of˜Vi’s. Obviously itwould be better to perform the scoring procedureStat 544, Lecture 22 6'&$%using the true variance function V = Diag(V1,...,Vn)rather than˜V . What are the properties of thisprocedure where˜V has been used instead of V ?1. The estimateˆβ is a consistent and asymptoticallyunbiased estimate of β,evenif˜V = V . It’s alsoasymptotically normal.2. If˜V = V thenˆβ is not eﬃcient; that is, theasymptotic variance ofˆβ is lowest when˜V = V .3. If˜V = V then the ﬁnal value of the matrix(DTV−1D)−1from the scoring procedure (3) (i.e.the value of this matrix withˆβ substituted for β)is a consistent estimate of Var(ˆβ).4. If˜V = V then the ﬁnal value of the matrix(DTV−1D)−1from (3) is not a consistentestimate of Var(ˆβ).Because of these properties,ˆβ may still be areasonable estimate of β if˜V = V , but the ﬁnal valueof (DT˜V−1D)−1—often called the “model-based” or“naive” estimator— will not give accurate standarderrors for the elements ofˆβ. However, this problemcan be corrected by using the “robust” or “sandwichStat 544, Lecture 22 7'&$%estimator,” deﬁned as“DT˜V−1D”−1“DT˜V−1E˜V−1D”“DT˜V−1D”−1,(4)whereE = Diag`(y1− μ1)2,...,(yn− μn)2´, (5)and all quantities in (4) and (5) are evaluated withˆβsubstituted for β. The sandwich estimator is aconsistent estimate of Var(ˆβ) even when˜V = V .The sandwich estimator was ﬁrst proposed by Huber(1967) and White (1980), but was popularized in thelate 1980’s when Liang and Zeger extended it tomultivariate or longitudinal responses.Longitudinal example. Data analyzed by Hedekerand Gibbons (1997). A randomized trial forschizophrenia• 312 patients received drug therapy; 101 receivedplacebo• measurements at weeks 0, 1, 3, 6, but somesubjects have missing data due to dropout• outcome: severity of illness (1=normal, ...,Stat 544, Lecture 22 8'&$%7=extremely ill)“Spaghetti plot” of response curves for all subjectsResponses for drug patients:Stat 544, Lecture 22 9'&$%Responses for placebo patients:Average for each group at each time point:Same plot versus square-root of week:Stat 544, Lecture 22 10'&$%As shown by the second plot, the average trajectoriesfor the placebo and drug groups appear to beapproximately linear when plotted against the squareroot of week.At baseline (week 0), the two groups have very similaraverages. This makes sense. In a randomized trial,the groups are initially

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-21-22 out of 22 pages.

PSU STAT 544 - Quasilikelihood and GEE

Sign up for free to view:

Please select your school