122S:138Bayesian StatististicsIntro to Hierarchical Normal LinearModelsLecture 16Oct. 19, 2009Kate Cowles374 SH, [email protected] of assumptions of linearregression• Homoscedasticity• Linearity• Independence• Normality• Existence3Hierarchical normal linear models• combine– hierarchical models– linear regression4Example: AIDS study ACTG116B/117• randomized. controlled, double-blind clinicaltrial• patients with at least 16 weeks of pri o r treat-ment with zidovudine (ZDV)• each patient was randomized to one of 3 treat-ments– continued ZDV– 2 different dose levels of dd* (another an-tiretrovial drug)• primary endpoint: progression to a new AIDS-defining event or death• primary results published in Kahn et al. (1992)• CD4 counts measured on a ll patients– at stu dy entry (week 0) and at weeks 2 ,8,12, 16,24, 32, 40, 48, 5 6 , a nd 645– your homework dataset con si sts of CD4counts taken up to week 24 from a subsetof the patients in 1 treatment group6Research question and statisticalmodels• two parameters of interest are average changein CD4 count per week in patie nts on eachof the two treatments– β1,A, β1,B, and β1,C• one approach: simple linear regression ap-plied separately to patients in each treatmentgroupyij|β0,g, β1,g, σ2∼ N(β0,g+ β1,gtij, σ2)where g = A, B, or C• which assumption of linear regression is vio-lated?• what are the likely consequences of the vio-lation o f this assumption?7Another possibility: separate linearregressions for each patient• would result in poor esimation of individualslopes and intercepts since there are few datavalues for each patient• then qu estio n arises of how to compute over-all slope for treatment group– average?– weighted average?8Hierarchical normal linear model• a compromise between– pooling all the data into one simple linearregression model∗ would violate independence assumption– separate linear regressions for each patient∗ would result in poor estimation of in di-vidual slopes and intercepts since thereare few data va lues for each patient9• notation:– yij–(transformed) CD4 count measuredon patient i a t week tij• stage 1: likelihoodfor each patient i, i = 1, . . . , Nyij|α0i, α1i, τ2y∼ N(α0i+ α1itij, τ2y)where τ2yis the precision of the points aroundthe patient-specific regression line• stage 2, formulation 1α0i|β0, τ2α0∼ N(β0, τ2α0)α1i|β1, τ2α1∼ N(β1, τ2α1)• stage 2, multivariate formulationα0iα1i|β0β1Σα∼ N2β0β1, Σ−1α10• third stage, first formulationβ0∼ N (µ0, τ20)β1∼ N (µ1, τ21)τ2y∼ G(ay, by)τ2α0∼ G(aα0, bα0)τ2α1∼ G(aα1, bα1)• third stage, multivariate formulationβ0β1|µ0µ1Σ0∼ N2µ0µ1, Σ−10τ2y∼ G(ay, by)Σ−1α∼ W ishart(R[2, 2], ρ)– where ρ is the degrees of freedom (scalar)∗ equivalent prior sample size∗ must be greater than or equal to d imen-sion of matrix in order for Wishart tobe proper– R is prior guess at order of covariance ma-trix Σα11∗ Wishart is multivarate generaliza ti o n ofgamma∗ ρ is the “degrees of freedom”· determines the degree of certainty youhave about the mean· for a Wishart distribution to be prior,ρ must be ≥ dimen si o n of the matrix· ρ is equivalent to prior sample size∗ Wishart distribution is p arameterized sev-eral different ways∗ WinBUGS does not use the same pa-rameterization as GCSR table∗ in WinBUGS parameterizationΣ−1α∼ W ishart(R[ 2 , 2], ρ)implies that E(Σα) is ρR−112Priors on precision matrices• WinBUGS requires parameterizing models includingan unknown variance/covariance matrix of a multi-variate normal distribution in terms of the precisionmatrix (inverse of the variance/covariance matrix).• The Wishart distribution is the conjugate prior for theprecision matrix of a multivariate normal distributionwith known mean.• It is the standard choice of prior for precision ma-trices in realistic multivariate-normal-based modelswith means (and possibly many other parameters) un-known because it leads to a Wishart full conditionaldistribution for the precision matrix th at simplifiesMCMC-based model fitting.• The two parameters of the Wishart distribution are amean matrix and a scalar parameter called the degreesof freedom.13multiple parameterizationsConfusingly, several different parameterizations of the Wishartdensity appear in the literature. If X denotes a p×p sym-metric, positive definite random matrix, R i s a fixed p×psymmetric, positive definite matrix, ν is a strictly positivescalar, and the p.d.f. of X isp(X|R, ν) ∝ |R|ν2|X|ν−p−12exp−12tr(RX)(1)then the references below define the two parameters asfollows:14Reference Parameterization[Spiegelhalter et al.(1995)Spiegelhalter, Thomas, Best, and Gilks] X ∼ dwish(R, ν)[Anderson(1984)][Carlin and Louis(2000)] X ∼ dwish(R−1, ν[Gelman et al.(1995)Gelman, Carlin, Stern, and Rubin][Box and Tiao(1973)] X ∼ dwish(R−1, ν15In what follows, we use the WinBUGS parameterization.The Wishart distribution is proper if ν ≥ p. If X ∼dwish(R, ν), then the moments are as follows:E(Xij) = ν(R−1)ijV ar(Xij) = ν(R−1)2ij+ (R−1)ii(R−1)jjCov(Xij, Xkl) = ν(R−1)ik(R−1)jl+ (R−1)il(R−1)jk16Note that the gamma distribution is a special (one-dimensional)case of the Wishart. If X and R are scalars, and the p.d.fof X is proportional to xν2−1exp−Rx2thenW (R, ν) = G(ν2,R2)WinBUGS does not allow the use of its Wishart distribu-tion with one-dimensional matri ces , however.If X ∼ dwish(R, ν), then X−1has an inverse Wishartdistribution: X−1∼ IW (R, ν), whereE(X−1ij=Rijν − p − 1The invers e Wishart distribution is always proper; how-ever, it has a degenerate form if ν < p, and obviously thefirst moment is neg ative or infinite unless ν > p + 1.17Since statisticians and subject-matter experts tend to bebetter able to think in terms of variances an d correlationsrather than of elements of precision matrices, the follow-ing way of specifying a prior on a covariance matrix, sayΣ, in WinBUGS is a ttracti ve:1. Let R equal the prior guess for
View Full Document