122S:138Introduction to Empirical BayesLecture 22Nov. 16, 2009Kate Cowles374 SH, [email protected] Bayes• Bayesian analysis requires specifying fixed val-ues for parameters of highest-stage priors– th e se values must come from source otherthan the current dataset• goal of empirical Bayes analysis is to fit hi-erarchical models without introducing infor-mation external to the current dataset• EB approach– estimates final-stage parameters using cur-rent data– th e n proceeds as though prior were known– requ ires adjustment to posterior standarddeviations and credible sets3Compound sampling framework• observed data conditionally independent givenparametersYi|θi∼ fi(yi|θi), i = 1, . . . n• family of prior distributi ons indexed by low-dimensional parameter ηθi∼ g(θi|η)4Parametric EB (PEB) point estimation• if η were known (fully Bayes)p(θi|yi, η) =fi(yi|θi)g(θi|η)mi(yi|η)– i.e. posterior for θidepends on data onlythrough yi– miis marginal likelihood of yi• PEB used marginal distribution of all thedata to estimate η– m (y|η)– use maxi mum l ikelihood or method of mo-ments to get estimate ˆη• plug ˆη into above expression to get estimatedposterior p(θi|yi, ˆη)– use estimated posterior for a ll inferen c e– e.g. point estimate of posterior mean– th is point estimate depends on all the datathrough ˆη = ˆη(y)5Example: Normal/Normal models• two-stage modelYi|θi∼ N(θi, σ2), i = 1, . . . , nθi|µ ∼ N(µ, τ2), i = 1, . . . , n• first assume both τ2and σ2known– ca lculations we have seen i n GCSR showthat marginally Yis are i.i.d.∗ (i.e. with θis integrated out)∗ Yi|µ ∼ N(µ, σ2+ τ2)– so the marginal likelihood of all the Yis ism(y|µ) =1[2π(σ2+ τ2)]n/2exp−12(σ2+ τ2)nXi=1(yi− µ)2– E B analysis requires estimation o f µ– margi nal MLE of µˆµ = ¯y6– estimated posterior d istribution of θip(θi|yi, ˆµ) = N(B ˆµ +(1−B)yi, (1−B)σ2)whereB =σ2σ2+ τ2exactly the same as fu lly Bayesian poste-rior for this case except that known priormean µ is replaced by sample mean com-puted from all the data– P E B point estimate of θiˆθµi= B ¯y + (1 − B)yi= ¯y + (1 − B)(yi− ¯y)– inference for single component borrows in-formation from data on all components– sh ri nkage estimator7• now suppose τ2, as well as µ, is unknown– estimates of bo th µ and τ2are needed– margi nal MLEs∗ ˆµ = ¯y∗ ˆτ2= (s2− σ2)+= max{0, (s2− σ2)}where s2=1nPni=1(yi− ¯y)2· the variation in the data over and abovethat expected if al l the θis were equal∗ MMLE for BˆB =σ2σ2+ ˆτ2=σ2σ2+ (s2− σ2)+∗ PEB estimates of θiˆθµ,τ2i= ¯y + (1 −ˆB)(yi− ¯y)∗ amount of shrinkage is controlled by theestimated heterogeneity in the data8Example: the dyes data• recall 2-stage modelYi|θi∼ N(θi, σ2), i = 1, . . . , nθi|µ ∼ N(µ, τ2), i = 1, . . . , n• consider indivi dual batch means as the yis,i = 1, . . . , 61505 1528 1564 1498 1600 1470• suppose– σ2indivfor individual observations was knownto be 2500 gm2– 5 observations per batch, so variance ofthese batch means is known to beσ2=σ2indiv5= 5009• suppose τ2was known to be 1600• MMLE of µˆµ = ¯y = 1527.5• B =σ2σ2+τ2=500500+1600= 0.238• then EB point estimates of θisθµ1= (1527.5) + (1 − 0.238)(1505 − 1527.5) = 1510.4θµ2= (1527.5) + (1 − 0.238)(1528 − 1527.5) = 1527.9θµ3= (1527.5) + (1 − 0.238)(1564 − 1527.5) = 1555.3θµ4= (1527.5) + (1 − 0.238)(1498 − 1527.5) = 1505.0θµ5= (1527.5) + (1 − 0.238)(1600 − 1527.5) = 1582.7θµ6= (1527.5) + (1 − 0.238)(1470 − 1527.5) = 1483.710Example continued• Suppose σ2= 500 is still known• but τ2is unknown• s2=16P6i=1(yi− ¯y)2= 1878.6• then ˆτ2= (s2− σ2)+= 1878.6 − 500 =1378.6• MMLEˆB =500500+1378.6= 0.266• then EB point estimates of θisθµ,τ21= (1527.5) + (1 − 0.266)(1505 − 1527.5) = 1521.5θµ,τ22= (1527.5) + (1 − 0.266)(1528 − 1527.5) = 1527.6θµ,τ23= (1527.5) + (1 − 0.266)(1564 − 1527.5) = 1537.2θµ,τ24= (1527.5) + (1 − 0.266)(1498 − 1527.5) = 1519.7θµ,τ25= (1527.5) + (1 − 0.266)(1600 − 1527.5) = 1546.8θµ,τ26= (1527.5) + (1 − 0.266)(1470 − 1527.5) = 1512.211Comments• EB estimates: compromise between– pool ing all data (ˆB = 1)– using only data from ith observation orgroup to estimate ith parameter (ˆB = 0)• one difficulty with PEB app ro a ch: choosinghow to estimate hyperparameters• contrast with fully Bayesian approach– would add another level to hierarchy∗ add prior on µ and τ2– replaces estimation with integration– avoids problem of selecting estimation method– a utomatically propagates to the posteriordistribution the uncertainty ind uced by es-timating µ and τ2– however, requires selection of hyperpriors12“Naive” EB i nterval estimation• given estimated posterior p(θi|yi, ˆη), use likeany other posterior distribution to obtain HPDor equal tail credible set for θi• from elementary math statV ar(θi|y)−Eη|y[V ar(θi|yi, η)]+V arη|y[E(θi|yi, η)]• in normal/normal case, 95% naive EBCI wouldbeE(θi|yi, ˆη) ± 1.96sV ar(θi|yi, ˆη)• i.e. naive EBCI ignores the posterior uncer-tainty about η.• so naive interval is very likely to be too short,and to h ave lower coverage probability thanclaimed• substanti a l statistica l literature on how tocorrect this problem13Interval estimationDefinitions of “EB coverage”• tα(y) is a (1-α)100% unconditional EB con-fidence set for θ if and only if for each ηPy,θ|η(θ ∈ tα(y)) ≈ 1 − α– eva luating performace of EBCI over vari-ability in both θ and the data• tα(y) is a (1-α)100% conditional EB confi-dence set for θ gi ven a data summary b(y) ifand only if for each b(y = b and ηPy,θ|b(y)=b, η(θ ∈ tα(y)) ≈ 1 − α– exa mple: if b(y) = y, then this is fullyBayesian coverage14Carl Morris’ s approach to E B intervaladjustment• for normal/normal model with σ2known– base EBCI o n modified estimated poste-rior– use “naive” mean– inflate variance to try to capture secondterm in true variancepMorris(θi|yi, ˆη) = N(ˆθEBi, V∗)whereV∗= σ21 −k − 1kˆB+2k − 3ˆB2(Yi−¯Y )2Other approaches to EBCI correction a re dis-cussed in Carlin and Louis, Ch. 3.15Simulation study example• compound sampling
View Full Document