DOC PREVIEW
Berkeley COMPSCI C281B - Gibbs Sampling (ctd) & Properties of Dirichlet Distribution

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS281B/Stat241B: Advanced Topics in Learning & Decision MakingGibbs Sampling (ctd) & Properties of Dirichlet DistributionLecturer: Michael I. Jordan Scribes: Xiaoyue Zhao1 Gibbs sampling example: Ising modelIn Ising model,P (X|θ) =1zexp{Xj<kθjkxjxk+Xiθixi},where θjk= 0, if j 6= k and z is a normalizing constant.P (Xi= 1|xv\i) =P (Xi= 1, xv\i)P (xv\i)=exp{Pj6=iθijxj+ θi+Pj<k,j6=i,k6=iθjkxjxk+Pj6=iθjxj}exp{Pj6=iθijxj+ θi+Pj<k,j6=i,k6=iθjkxjxk+Pj6=iθjxj} + exp{Pj<k,j6=i,k6=iθjkxjxk+Pj6=iθjxj}=exp{Pj6=iθijxj+ θi}exp{Pj6=iθijxj+ θi} + 1=11 + exp{−Pj6=iθijxj− θi}.Here v is the set of all the vertices. The conditional probability is in the form of logistic function.XiXi(a)(b)Figure 1: Markov Blanket.Figure 1(b) gives an example of the Markov blanket for node Xifor the Ising model. The conditionalprobability of a node Xidepends only on its neighbors.(For directed graphical models, the Markov blanket includes all parents, all children and all co-parents, asshown in Figure 1(b)).Gibbs sampling can be viewed as a special case of Metropolis-Hastings. In Gibbs sampling, the proposaldistribution is the posterior probability and the acceptance probability is always 1.12 Gibbs Sampling (ctd) & Properties of Dirichlet Distribution2 Dirichlet Processes2.1 Dirichlet distributionThe Dirichlet distribution is a member of the exponential family. It is a conjugate distribution to themultinomial. In a non-minimal representation, its density can be written:P (p1, p2, · · · , pk|α1, α2, · · · , αk) =Γ(Pαi)Qki=1Γ(αi)pα1−11pα2−12· · · pαk−1k,where αi> 0 andPpi= 1.2.2 Examples1. k = 2. There is only one p.2. k = 3. It corresponds to a simplex embedded to a 3-dimension space, see Figure 2. For 0 < α < 1,multiple modes appear in the corners.Figure 2: Geometric description of Dirichlet distribution when k = 3.Special case of k = 2:The Dirichlet distribution Beta(α1, α2) is conjugate to Xi∼ Bernoulli(p1). Figure 3 gives the correspondinggraphical model representation.aa12p1XinFigure 3: Graphical model representation.The posterior distribution of p1given the data is:P (p1|X1, · · · , Xn) ∝ pα1−11(1 − p1)α2−1nYi=1pXi1(1 − p1)1−XiGibbs Sampling (ctd) & Properties of Dirichlet Distribution 3= pα1+nPi=1δ1(Xi)−11(1 − p1)α2+nPi=1δ2(Xi)−1∼ Beta(α1+nXi=1δ1(xi), α2+nXi=1δ2(xi))p1has prior parameters α1and α2. After having data, add spikesnPi=1δ1(Xi) to α1,nPi=1δ2(Xi) to α2. Byaveraging out p, we haveP (X1, · · · , Xn|α1, α2) =Zdp1P (X1, · · · , Xn, p1|α1, α2)=Zdp1P (X1, · · · , Xn|p1)P (p1|α1, α2)=Zdp1YipXi1(1 − p1)1−XiΓ(α1+ α2)Γ(α1)Γ(α2)pα1−11(1 − p1)α2−1=Γ(α1+ α2)Γ(α1)Γ(α2)Γ(α1+nPi=1δ1(Xi))Γ(α2+nPi=1δ2(Xi))Γ(α1+ α2+ n)=α[Piδ1(Xi)]1α[Piδ2(Xi)]2(α1+ α2)[n],where X[n]= X(X + 1) · · · (X + n − 1). Here we use the recursion formula Γ(x + 1) = xΓ(x). SoP (Xn+1|X1, · · · , Xn, α1, α2) =Γ(α++ n)Γ(α++ n + 1)Γ(α1+n+1Pi=1δ1(Xi))Γ(α1+nPi=1δ1(Xi))Γ(α2+n+1Pi=1δ2(Xi))Γ(α2+nPi=1δ2(Xi))=α1+nPi=1δ1(Xi)α++n, Xn+1= 1α2+nPi=1δ2(Xi)α++n, Xn+1= 2which has the form of the urn model.P (Xn+1= 1|X1, · · · , Xn, α1, α2) can be written as the weighted summation of prior mean of p1and maximumlikelihood estimate (MLE) of p1, i.e.,α1+nPi=1δ1(Xi)α++ n=α1α+(α+α++ n) + (nα++ n)1nXδ1(Xi).General case of DirichletThe mean has the form ofE(Pi) =αiα+, where α+=kXi=1αi.4 Gibbs Sampling (ctd) & Properties of Dirichlet DistributionThe Dirichlet distribution is conjugate to multinomial. The posterior still follows the Dirichlet:P (P1, · · · , Pk|X1, · · · , Xn) ∼ Dir(α1+nXi=1δ1(Xi), · · · , αk+nXi=1δk(Xi)).The Marginal distribution is:P (X1, · · · , Xn|α1, · · · , αk) =α[Piδ1(Xi)]1· · · α[Piδk(Xi)]kα[n]+,andP (Xn+1= j|X1, · · · , Xn, α) =αj+nPi=1δi(Xi)α++ n,which again can be viewed in terms of an urn model.Let B1, B2, · · · , Blbe a partition at {1, 2, · · · , k}, l < k. Let 1 < r1< · · · < rl= k be integers. ThenP (r1Xi=1Pi,r2Xi=r1+1Pi, · · · ,kXi=rl−1+1Pi) ∼ Dir(r1Xi=1αi,r2Xi=r1+1αi, · · · ,kXi=rl−1+1αi).This property is both necessary and sufficient for a distribution to be Dirichlet.2.3 Dirichlet process: DP (G0, α)Let G0be a probability measure on (X, Ω). Then ∃ a unique probability measure P (G0, α) on the space ofmeasures on X, such that(P (B1), P (B2), · · · , P (Bk)) ∼ Dir(αG0(B1), · · · , αG0(Bk)),where B1, · · · , Bkis a partition of


View Full Document

Berkeley COMPSCI C281B - Gibbs Sampling (ctd) & Properties of Dirichlet Distribution

Download Gibbs Sampling (ctd) & Properties of Dirichlet Distribution
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Gibbs Sampling (ctd) & Properties of Dirichlet Distribution and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Gibbs Sampling (ctd) & Properties of Dirichlet Distribution 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?