DOC PREVIEW
PSU STAT 544 - Quasilikelihood and GEE

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Stat 544, Lecture 22 1'&$%Quasilikelihoodand GEEQuasilikelihood. Suppose that we observe responsesy1,y2,...,yNwhich we want to relate to covariates.The ordinary linear regression model isyi∼ N(xTiβ,σ2),where xiis a vector of covariates, β is a vector ofcoefficients to be estimated, and σ2is the errorvariance. Now let’s generalize this model in two ways:• Introduce a link function for the meanE(yi)=μi,g(μi)=xTiβ.We could also write this asE(yi)=μi(β), (1)where the covariates and the link become part ofthe function μi. For example, in a loglinearmodel we would have μi=exp(xTiβ).Stat 544, Lecture 22 2'&$%• Allow for heteroscedasticity, so thatVar(yi)=Vi. (2)In many cases Viwill depend on the mean μiandtherefore on β, and possibly on additionalunknown parameters (e.g. a scale factor). Forexample, in a traditional overdispersed loglinearmodel, we would have Vi= σ2μi= σ2exp(xTiβ).A maximum-likelihood estimate for β under thismodel could be computed by a Fisher scoringprocedure. ML estimates have two nice theoreticalproperties: they are approximately unbiased andhighly efficient. Interestingly, the asymptotic theoryunderlying these properties does not really depend onthe normality of yi, but only on the first twomoments. That is, if the mean function (1) and thevariance function (2) are correct, but the distributionof yiis not normal, the estimate of β obtained bymaximizing the normal loglikelihood fromyi∼ N( μi(β),Vi(β))is still asymptotically unbiased and efficient.Stat 544, Lecture 22 3'&$%Quasi-scoring. If we maximize the normality-basedloglikelihood without assuming that the response isnormally distributed, the resulting estimate of β iscalled a quasilikelihood estimate. The iterativeprocedure for computing the quasilikelihood estimate,called quasi-scoring, proceeds as follows.First, let’s collect the responses and their means intovectors of length N,y =26666664y1y2...yN37777775,μ=26666664μ1μ2...μN37777775.Also, let V be the N × N matrix with V1,...,VNonthe diagonal and zeros elsewhere. Finally, letD =∂μ∂βbe the N × p matrix containing the partial derivativesof μ with respect to β. That is, the (i, j)th element ofD is ∂μi/∂βj. In a simple linear model with anidentity link, μi= xTiβ, D is simply the N × p matrixStat 544, Lecture 22 4'&$%of covariatesX =26666664xT1xT2...xTN37777775.The iterative quasi-scoring procedure isβnew− βold=“DTV−1D”−1DTV−1(y − μ), (3)where on the right-hand side of (3) the quantities D,˜V , and μ are evaluated at β = βold.It is easy to see that in the case of simple linearregression with homoscedastic response (μi= xTiβand Vi= σ2), the quasiscoring procedure converges tothe OLS estimateˆβ =(XTX)−1XTyin a single step regardless of βold. In other cases (e.g.loglinear regression where μi=exp(xTiβ),Vi= σ2μi)it produces the same estimate for β that we obtainedearlier by fitting the generalized linear model.Quasilikelihood estimation is really the same thing asgeneralized linear modeling, except that we no longerStat 544, Lecture 22 5'&$%have a full parametric model for yi.If we let U be the p × 1 vectorU = DTV−1(y − μ),the final estimate from the quasi-scoring proceduresatisfies the condition U =0. U can be regarded asthe first derivative of the quasi-loglikelihood function.The first derivative of a loglikelihood is called a score,so U is called a quasi-score. U = 0 is often called theset of estimating equations, and the final estimate forβ is the solution to the estimating equations.Estimating equations with a working variancefunction. We’ll suppose that the mean regressionfunction μi(β) has been correctly specified but thevariance function has not. That is, the data analystincorrectly supposes that the variance function for yiis˜Virather than Vi, where˜Viis another function of β.The analyst then estimates β by solving thequasi-score equationsU = DT˜V−1(y − μ)=0where˜V is the diagonal matrix of˜Vi’s. Obviously itwould be better to perform the scoring procedureStat 544, Lecture 22 6'&$%using the true variance function V = Diag(V1,...,Vn)rather than˜V . What are the properties of thisprocedure where˜V has been used instead of V ?1. The estimateˆβ is a consistent and asymptoticallyunbiased estimate of β,evenif˜V = V . It’s alsoasymptotically normal.2. If˜V = V thenˆβ is not efficient; that is, theasymptotic variance ofˆβ is lowest when˜V = V .3. If˜V = V then the final value of the matrix(DTV−1D)−1from the scoring procedure (3) (i.e.the value of this matrix withˆβ substituted for β)is a consistent estimate of Var(ˆβ).4. If˜V = V then the final value of the matrix(DTV−1D)−1from (3) is not a consistentestimate of Var(ˆβ).Because of these properties,ˆβ may still be areasonable estimate of β if˜V = V , but the final valueof (DT˜V−1D)−1—often called the “model-based” or“naive” estimator— will not give accurate standarderrors for the elements ofˆβ. However, this problemcan be corrected by using the “robust” or “sandwichStat 544, Lecture 22 7'&$%estimator,” defined as“DT˜V−1D”−1“DT˜V−1E˜V−1D”“DT˜V−1D”−1,(4)whereE = Diag`(y1− μ1)2,...,(yn− μn)2´, (5)and all quantities in (4) and (5) are evaluated withˆβsubstituted for β. The sandwich estimator is aconsistent estimate of Var(ˆβ) even when˜V = V .The sandwich estimator was first proposed by Huber(1967) and White (1980), but was popularized in thelate 1980’s when Liang and Zeger extended it tomultivariate or longitudinal responses.Longitudinal example. Data analyzed by Hedekerand Gibbons (1997). A randomized trial forschizophrenia• 312 patients received drug therapy; 101 receivedplacebo• measurements at weeks 0, 1, 3, 6, but somesubjects have missing data due to dropout• outcome: severity of illness (1=normal, ...,Stat 544, Lecture 22 8'&$%7=extremely ill)“Spaghetti plot” of response curves for all subjectsResponses for drug patients:Stat 544, Lecture 22 9'&$%Responses for placebo patients:Average for each group at each time point:Same plot versus square-root of week:Stat 544, Lecture 22 10'&$%As shown by the second plot, the average trajectoriesfor the placebo and drug groups appear to beapproximately linear when plotted against the squareroot of week.At baseline (week 0), the two groups have very similaraverages. This makes sense. In a randomized trial,the groups are initially


View Full Document

PSU STAT 544 - Quasilikelihood and GEE

Download Quasilikelihood and GEE
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Quasilikelihood and GEE and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Quasilikelihood and GEE 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?