CS281B/Stat241B: Advanced Topics in Learning & Decision MakingGibbs Sampling (ctd) & Properties of Dirichlet DistributionLecturer: Michael I. Jordan Scribes: Xiaoyue Zhao1 Gibbs sampling example: Ising modelIn Ising model,P (X|θ) =1zexp{Xj<kθjkxjxk+Xiθixi},where θjk= 0, if j 6= k and z is a normalizing constant.P (Xi= 1|xv\i) =P (Xi= 1, xv\i)P (xv\i)=exp{Pj6=iθijxj+ θi+Pj<k,j6=i,k6=iθjkxjxk+Pj6=iθjxj}exp{Pj6=iθijxj+ θi+Pj<k,j6=i,k6=iθjkxjxk+Pj6=iθjxj} + exp{Pj<k,j6=i,k6=iθjkxjxk+Pj6=iθjxj}=exp{Pj6=iθijxj+ θi}exp{Pj6=iθijxj+ θi} + 1=11 + exp{−Pj6=iθijxj− θi}.Here v is the set of all the vertices. The conditional probability is in the form of logistic function.XiXi(a)(b)Figure 1: Markov Blanket.Figure 1(b) gives an example of the Markov blanket for node Xifor the Ising model. The conditionalprobability of a node Xidepends only on its neighbors.(For directed graphical models, the Markov blanket includes all parents, all children and all co-parents, asshown in Figure 1(b)).Gibbs sampling can be viewed as a special case of Metropolis-Hastings. In Gibbs sampling, the proposaldistribution is the posterior probability and the acceptance probability is always 1.12 Gibbs Sampling (ctd) & Properties of Dirichlet Distribution2 Dirichlet Processes2.1 Dirichlet distributionThe Dirichlet distribution is a member of the exponential family. It is a conjugate distribution to themultinomial. In a non-minimal representation, its density can be written:P (p1, p2, · · · , pk|α1, α2, · · · , αk) =Γ(Pαi)Qki=1Γ(αi)pα1−11pα2−12· · · pαk−1k,where αi> 0 andPpi= 1.2.2 Examples1. k = 2. There is only one p.2. k = 3. It corresponds to a simplex embedded to a 3-dimension space, see Figure 2. For 0 < α < 1,multiple modes appear in the corners.Figure 2: Geometric description of Dirichlet distribution when k = 3.Special case of k = 2:The Dirichlet distribution Beta(α1, α2) is conjugate to Xi∼ Bernoulli(p1). Figure 3 gives the correspondinggraphical model representation.aa12p1XinFigure 3: Graphical model representation.The posterior distribution of p1given the data is:P (p1|X1, · · · , Xn) ∝ pα1−11(1 − p1)α2−1nYi=1pXi1(1 − p1)1−XiGibbs Sampling (ctd) & Properties of Dirichlet Distribution 3= pα1+nPi=1δ1(Xi)−11(1 − p1)α2+nPi=1δ2(Xi)−1∼ Beta(α1+nXi=1δ1(xi), α2+nXi=1δ2(xi))p1has prior parameters α1and α2. After having data, add spikesnPi=1δ1(Xi) to α1,nPi=1δ2(Xi) to α2. Byaveraging out p, we haveP (X1, · · · , Xn|α1, α2) =Zdp1P (X1, · · · , Xn, p1|α1, α2)=Zdp1P (X1, · · · , Xn|p1)P (p1|α1, α2)=Zdp1YipXi1(1 − p1)1−XiΓ(α1+ α2)Γ(α1)Γ(α2)pα1−11(1 − p1)α2−1=Γ(α1+ α2)Γ(α1)Γ(α2)Γ(α1+nPi=1δ1(Xi))Γ(α2+nPi=1δ2(Xi))Γ(α1+ α2+ n)=α[Piδ1(Xi)]1α[Piδ2(Xi)]2(α1+ α2)[n],where X[n]= X(X + 1) · · · (X + n − 1). Here we use the recursion formula Γ(x + 1) = xΓ(x). SoP (Xn+1|X1, · · · , Xn, α1, α2) =Γ(α++ n)Γ(α++ n + 1)Γ(α1+n+1Pi=1δ1(Xi))Γ(α1+nPi=1δ1(Xi))Γ(α2+n+1Pi=1δ2(Xi))Γ(α2+nPi=1δ2(Xi))=α1+nPi=1δ1(Xi)α++n, Xn+1= 1α2+nPi=1δ2(Xi)α++n, Xn+1= 2which has the form of the urn model.P (Xn+1= 1|X1, · · · , Xn, α1, α2) can be written as the weighted summation of prior mean of p1and maximumlikelihood estimate (MLE) of p1, i.e.,α1+nPi=1δ1(Xi)α++ n=α1α+(α+α++ n) + (nα++ n)1nXδ1(Xi).General case of DirichletThe mean has the form ofE(Pi) =αiα+, where α+=kXi=1αi.4 Gibbs Sampling (ctd) & Properties of Dirichlet DistributionThe Dirichlet distribution is conjugate to multinomial. The posterior still follows the Dirichlet:P (P1, · · · , Pk|X1, · · · , Xn) ∼ Dir(α1+nXi=1δ1(Xi), · · · , αk+nXi=1δk(Xi)).The Marginal distribution is:P (X1, · · · , Xn|α1, · · · , αk) =α[Piδ1(Xi)]1· · · α[Piδk(Xi)]kα[n]+,andP (Xn+1= j|X1, · · · , Xn, α) =αj+nPi=1δi(Xi)α++ n,which again can be viewed in terms of an urn model.Let B1, B2, · · · , Blbe a partition at {1, 2, · · · , k}, l < k. Let 1 < r1< · · · < rl= k be integers. ThenP (r1Xi=1Pi,r2Xi=r1+1Pi, · · · ,kXi=rl−1+1Pi) ∼ Dir(r1Xi=1αi,r2Xi=r1+1αi, · · · ,kXi=rl−1+1αi).This property is both necessary and sufficient for a distribution to be Dirichlet.2.3 Dirichlet process: DP (G0, α)Let G0be a probability measure on (X, Ω). Then ∃ a unique probability measure P (G0, α) on the space ofmeasures on X, such that(P (B1), P (B2), · · · , P (Bk)) ∼ Dir(αG0(B1), · · · , αG0(Bk)),where B1, · · · , Bkis a partition of
View Full Document