122S:138Bayesian StatististicsIntro to Hierarchical Normal LinearModelsLecture 16Oct. 24, 2005Kate Cowles374 SH, [email protected] of assumptions of linearregression• Homoscedasticity• Linearity• Independence• Normality• Existence3Hierarchical normal linear models• combine– hierarchical models– linear regression4Example: AIDS study ACTG116B/117• randomized. controlled, double-blind clinicaltrial• patients with at least 16 weeks of prior treat-ment with zidovudine (ZDV)• each patient was randomized to one of 3 treat-ments– continued ZDV– 2 different dose levels of dd* (another an-tiretrovial drug)• primary endpoint: progression to a new AIDS-defining event or death• primary results published in Kahn et al. (1992)• CD4 counts measured on all patients– at study entry (week 0) and at weeks 2 ,8,12, 16,24, 32, 40, 48, 56, and 645– your homework dataset consists of CD4counts taken up to week 24 from a subsetof the patients in 1 treatment group6Research question and statisticalmodels• two parameters of interest are average changein CD4 count per week in patients on eachof the two treatments– β1,A, β1,B, and β1,C• one approach: simple linear regression ap-plied separately to patients in each treatmentgroupyij|β0,g, β1,g, σ2∼ N(β0,g+ β1,gtij, σ2)where g = A, B, or C• which assumption of linear regression is vio-lated?• what are the likely consequences of the vio-lation of this assumption?7Another possibility: separate linearregressions for each patient• would result in poor esimation of individualslopes and intercepts since there are few datavalues for each patient• then question arises of how to compute over-all slope for treatment group– average?– weighted average?8Hierarchical normal linear model• a compromise between– pooling all the data into one simple linearregression model∗ would violate independence assumption– separate linear regressions for each patient∗ would result in poor estimation of indi-vidual slopes and intercepts since thereare few data values for each patient9• notation:– yij–(transformed) CD4 count measuredon patient i at week tij• stage 1: likelihoodfor each patient i, i = 1, . . . , Nyij|α0i, α1i, τ2y∼ N (α0i+ α1itij, τ2y)where τ2yis the precision of the points aroundthe patient-specific regression line• stage 2, formulation 1α0i|β0, τ2α0∼ N (β0, τ2α0)α1i|β1, τ2α1∼ N (β1, τ2α1)• stage 2, multivariate formulationα0iα1i|β0β1Σα∼ N2β0β1, Σ−1α10• third stage, first formulationβ0∼ N(µ0, τ20)β1∼ N(µ1, τ21)τ2y∼ G(ay, by)τ2α0∼ G(aα0, bα0)τ2α1∼ G(aα1, bα1)• third stage, multivariate formulationβ0β1|µ0µ1Σ0∼ N2µ0µ1, Σ−10τ2y∼ G(ay, by)Σ−1α∼ W ishart(R[2, 2], ρ)– where ρ is the degrees of freedom (scalar)∗ equivalent prior sample size∗ must be greater than or equal to dimen-sion of matrix in order for Wishart tobe proper– R is prior guess at order of covariance ma-trix Σα11∗ Wishart is multivarate generalization ofgamma∗ ρ is the “degrees of freedom”· determines the degree of certainty youhave about the mean· for a Wishart distribution to be prior,ρ must be ≥ dimension of the matrix· ρ is equivalent to prior sample size∗ Wishart distribution is parameterized sev-eral different ways∗ WinBUGS does not use the same pa-rameterization as GCSR table∗ in WinBUGS parameterizationΣ−1α∼ W ishart(R[2, 2], ρ)implies that E(Σα) is ρR−112Priors on precision matrices• WinBUGS requires parameterizing models includingan unknown variance/covariance matrix of a multi-variate normal distribution in terms of the precisionmatrix (inverse of the variance/covariance matrix).• The Wishart distribution is the conjugate prior for theprecision matrix of a multivariate normal distributionwith known mean.• It is the standard choice of prior for precision ma-trices in realistic multivariate-normal-based modelswith means (and possibly many other parameters) un-known because it leads to a Wishart full conditionaldistribution for the precision matrix that simplifiesMCMC-based model fitting.• The two parameters of the Wishart distribution are amean matrix and a scalar parameter called the degreesof freedom.13multiple parameterizationsConfusingly, several different parameterizations of the Wishartdensity appear in the literature. If X denotes a p×p sym-metric, positive definite random matrix, R is a fixed p×psymmetric, positive definite matrix, ν is a strictly positivescalar, and the p.d.f. of X isp(X|R, ν) ∝ |R|ν2|X|ν−p−12exp−12tr(RX)(1)then the references below define the two parameters asfollows:14Reference Parameterization[?] X ∼ dwish(R, ν)[?][?] X ∼ dwish(R−1, ν)[?][?] X ∼ dwish(R−1, ν − p + 1)15In what follows, we use the WinBUGS parameterization.The Wishart distribution is proper if ν ≥ p. If X ∼dwish(R, ν), then the moments are as follows:E(Xij) = ν(R−1)ijV ar(Xij) = ν(R−1)2ij+ (R−1)ii(R−1)jjCov(Xij, Xkl) = ν(R−1)ik(R−1)jl+ (R−1)il(R−1)jk16Note that the gamma distribution is a special (one-dimensional)case of the Wishart. If X and R are scalars, and the p.d.fof X is proportional to xν2−1exp−Rx2thenW (R, ν) = G(ν2,R2)WinBUGS does not allow the use of its Wishart distribu-tion with one-dimensional matrices, however.If X ∼ dwish(R, ν), then X−1has an inverse Wishartdistribution: X−1∼ IW (R, ν), whereE(X−1ij=Rijν − p − 1The inverse Wishart distribution is always proper; how-ever, it has a degenerate form if ν < p, and obviously thefirst moment is negative or infinite unless ν > p + 1.17Since statisticians and subject-matter experts tend to bebetter able to think in terms of variances and correlationsrather than of elements of precision matrices, the follow-ing way of specifying a prior on a covariance matrix, sayΣ, in WinBUGS is attractive:1. Let R equal the prior guess for the mean of the p × pvariance/covariance matrix Σ.2. Choose a degrees-of-freedom parameter ν (> p + 1)that roughly represents an “equivalent prior samplesize” – your belief in R
View Full Document