DOC PREVIEW
CMU CS 10708 - lecture

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11School of Computer ScienceApproximate Inference:Mean Field MethodsProbabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)Lecture 17, Nov 12, 2007Eric XingEric XingReceptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Reading: KF-Chap. 12Eric Xing 2z Questions????z Kalman Filtersz Complex modelsz LBP-Bethe Minimization2Eric Xing 3Approximate InferenceEric Xing 4z For a distribution p(X|θ) associated with a complex graph, computing the marginal (or conditional) probability of arbitraryrandom variable(s) is intractablez Variational methodsz formulating probabilistic inference as an optimization problem:{})( maxarg*fFff S∈=queries ticprobabiliscertain tosolutions or,ondistributiy probabilit )(tractable a:fe.g.Variational Methods3Eric Xing 5z Exponential representation of graphical models:z Includes discrete models, Gaussian, Poisson, exponential, and many others⎭⎬⎫⎩⎨⎧−=∑ααααφθ)()(exp)|(θθApDXX⇒−=∑xXXααααφθ )()( state of theas toreferred is energyED{})()(exp)|(θθAEp−−=XX{})(),(expEEHAE xxXθ,−−=Exponential Family∏∈=CcccZP )()( XXψ1⇒Eric Xing 6⎭⎬⎫⎩⎨⎧+=∑∑< iiijijiijXXXZXp0exp1)(θθExample: the Boltzmanndistribution on atomic lattice4Eric Xing 748-2 -1 0 1 2))(exp()exp( 1+−≥µµxx)exp(x())()()()exp()exp( 1636123+−+−+−≥µµµµ xxxx)exp(xLower bounds of exponential functionsEric Xing 8Lemma: Every marginal distribution q(XH) defines a lower bound of likelihood:where xEdenotes observed variables (evidence).{}()(), )(),()( )(exp)(HEHEHHEEEAEdpxxxxxxx′−−−′−≥∫1Upgradeable to higher order bound Upgradeable to higher order bound [Leisink and Kappen, 2000]Representing q(XH) by exp{-E’(XH)}:Lower bounding likelihood5Eric Xing 9Lemma: Every marginal distribution q(XH) defines a lower bound of likelihood:where xEdenotes observed variables (evidence).,)(log)(),()()(qqHHHqEHEHECqqdECpH+−=−−≥∫xxxxXxXRepresenting q(XH) by exp{-E’(XH)}:,qqHEC +−=entropy : energy expected :qqHEenergy free Gibbs : qqHE −Lower bounding likelihoodEric Xing 10z Kullback-Leibler Distance:z “Boltzmann’s Law” (definition of “energy”):∑≡zzpzqzqpqKL)()(ln)()||(KL and variational (Gibbs) free energy[])(exp)( zECzp −=1∑∑++≡zzCzqzqzEzqpqKL ln)(ln)()()()||(Gibbs Free Energy ; minimized when)()( ZpZq=)(qG6Eric Xing 11KL and Log Likelihoodz Jensen’s inequalityz KL and Lower bound of likelihoodz Setting q()=p(z|x) closes the gap (c.f. EM) ∑∑∑≥===zzzxzqzxpxzqxzqzxpxzqzxpxpx)|()|,(log)|()|()|,()|(log)|,(log)|(log);(θθθθθl)(),;();( qHzxxqqcLll=+≥⇒θθ∑∑∑∑+=====zzzzxzpzqzqzqzxpzqxzpzqzqzxpzqxzpzxpzqxzpzxpxpx),|()(log)()()|,(log)(),|()()()|,(log)(),|()|,(log)(),|()|,(log)|(log);(θθθθθθθθθθl)||()();( pqKLqx+=⇒Llθln ( )pDln ( )pDL()qL( )qKL( || )qpKL( || )qpEric Xing 12{}{}qqQqqqQqHEHEq−=+−=∈∈ minarg maxargDifficulty: Hqis intractable for general q“solution”: approximate Hqand/or, relax or tighten Qwhere Q is the equivalent sets of realizable distributions, e.g., all valid parameterizations of exponential family distributions, marginal polytopes[winright et al. 2003].A variational representation of probability distributions7Eric Xing 13z But we do not optimize q(X) explicitly, focus on the set of beliefsz e.g.,z Relax the optimization problemz approximate objective:z relaxed feasible set:z The loopy BP algorithm: z a fixed point iteration procedure that tries to solve b*{})( minarg*bFEbbbo−=∈M)}( ),,({, iijijixbxxbbττ===)(bFHq≈oMM →)( MM ⊇o) ,(,ijiBethabbHH={}∑∑==≥=iixjjixioxxxx)(),(,)(|ττττ10MBethe Free Energy/LBPEric Xing 14z Optimize q(XH) in the space of tractable familiesz i.e., subgraph of Gpover which exact computation of Hqis feasiblez Tightening the optimization spacez exact objective:z tightened feasible set: qHTQ →qqqHEq −=∈ minarg*T)( QT ⊆Mean field methods8Eric Xing 15Mean Field ApproximationMean Field ApproximationEric Xing 16)]([ XpG)}]([{ccXqGExact:Clusters:(intractable)Cluster-based approx. to the Gibbs free energy(Wiegerinck 2001, Xing et al 03,04)9Eric Xing 17Mean field approx. to Gibbs free energyz Given a disjoint clustering, {C1, … , CI}, of all variablesz Let z Mean-field free energyz Will never equal to the exact Gibbs free energy no matter what clustering is used, but it does always define a lower bound of the likelihood z Optimize each qi(xc)'s. z Variational calculus …z Do inference in each qi(xc) using any tractable algorithm),()(iiiqqCXX∏=()()()∑∑∑∑∏+=iCiCiiCiCiiCiiiiCiqqEqGxxxxxx ln)(MF()()()()()∑∑∑∑∑∑++=< ixiiiixijijixxjiiijixqxqxxqxxxqxqG ln)()( e.g.,MFφφ(naïve mean field)Eric Xing 18),|()(,,,,*ijiiiiqMBHCECHCHipq≠= XxXXTheorem: The optimum GMF approximation to the cluster marginal is isomorphic to the cluster posterior of the original distribution given internal evidence and its generalized mean fields:GMF algorithm: Iterate over each qiThe Generalized Mean Field theorem10Eric Xing 19[xing et al. UAI 2003]A generalized mean field algorithmEric Xing 20[xing et al. UAI 2003]A generalized mean field algorithm11Eric Xing 21Theorem: The GMF algorithm is guaranteed to converge to a local optimum, and provides a lower bound for the likelihood of evidence (or partition function) the model.Convergence theoremEric Xing 22Gibbs predictive distribution:⎪⎭⎪⎬⎫⎪⎩⎪⎨⎧++=∑∈−ijijiijiiiiAxXXxXpNθθ0exp)|(}):{|(ijijxXpN∈=jxjxmean field equation:}):{|(exp)(ijijiqjiijiiiijXXpAXXXXqjqijNN∈〉〈=⎪⎭⎪⎬⎫⎪⎩⎪⎨⎧++=∑∈θθ0jqjX}:{ijjXjqN∈〉〈Xiz Approximate p(X) by fully factorized q(X)=Piqi(Xi)z For Boltzmann distribution p(X)=exp{∑i < jqijXiXj+qioXi}/Z :Xi ℑxj〉qjresembles a “message” sent from node j to i  {〈xj〉qj: j ∈ Ni} forms the “mean field” applied toXifrom its neighborhood}:{iqjjXjN∈〉〈jqjXThe naive mean field approximation12Eric Xing 23Cluster marginal of a square block Ck:⎪⎭⎪⎬⎫⎪⎩⎪⎨⎧++∝∑∑∑∈∈∈∈∈kkMBCkkMBjkCikCkkCjiXqjiijCiiijiijCXXXXXXq,)(',,'exp)(θθθ0Virtually a reparameterized Ising model of small size.Generalized MF approximation to Ising modelsEric Xing 24GMF approximation to IsingmodelsGMF2x2GMF4x4BPAttractive coupling: positively weightedRepulsive coupling: negatively weighted13Eric Xing


View Full Document

CMU CS 10708 - lecture

Documents in this Course
Lecture

Lecture

15 pages

Lecture

Lecture

25 pages

Lecture

Lecture

24 pages

causality

causality

53 pages

lecture11

lecture11

16 pages

Exam

Exam

15 pages

Notes

Notes

12 pages

lecture

lecture

16 pages

Lecture

Lecture

17 pages

Lecture

Lecture

15 pages

Lecture

Lecture

17 pages

Lecture

Lecture

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

16 pages

r6

r6

22 pages

lecture

lecture

20 pages

lecture

lecture

35 pages

Lecture

Lecture

19 pages

Lecture

Lecture

21 pages

lecture

lecture

21 pages

lecture

lecture

13 pages

review

review

50 pages

Semantics

Semantics

30 pages

lecture21

lecture21

26 pages

MN-crf

MN-crf

20 pages

hw4

hw4

5 pages

lecture

lecture

12 pages

Lecture

Lecture

25 pages

Lecture

Lecture

25 pages

Lecture

Lecture

14 pages

Lecture

Lecture

15 pages

Load more
Download lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?