11School of Computer ScienceApproximate Inference:Mean Field MethodsProbabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)Lecture 17, Nov 12, 2007Eric XingEric XingReceptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Reading: KF-Chap. 12Eric Xing 2z Questions????z Kalman Filtersz Complex modelsz LBP-Bethe Minimization2Eric Xing 3Approximate InferenceEric Xing 4z For a distribution p(X|θ) associated with a complex graph, computing the marginal (or conditional) probability of arbitraryrandom variable(s) is intractablez Variational methodsz formulating probabilistic inference as an optimization problem:{})( maxarg*fFff S∈=queries ticprobabiliscertain tosolutions or,ondistributiy probabilit )(tractable a:fe.g.Variational Methods3Eric Xing 5z Exponential representation of graphical models:z Includes discrete models, Gaussian, Poisson, exponential, and many others⎭⎬⎫⎩⎨⎧−=∑ααααφθ)()(exp)|(θθApDXX⇒−=∑xXXααααφθ )()( state of theas toreferred is energyED{})()(exp)|(θθAEp−−=XX{})(),(expEEHAE xxXθ,−−=Exponential Family∏∈=CcccZP )()( XXψ1⇒Eric Xing 6⎭⎬⎫⎩⎨⎧+=∑∑< iiijijiijXXXZXp0exp1)(θθExample: the Boltzmanndistribution on atomic lattice4Eric Xing 748-2 -1 0 1 2))(exp()exp( 1+−≥µµxx)exp(x())()()()exp()exp( 1636123+−+−+−≥µµµµ xxxx)exp(xLower bounds of exponential functionsEric Xing 8Lemma: Every marginal distribution q(XH) defines a lower bound of likelihood:where xEdenotes observed variables (evidence).{}()(), )(),()( )(exp)(HEHEHHEEEAEdpxxxxxxx′−−−′−≥∫1Upgradeable to higher order bound Upgradeable to higher order bound [Leisink and Kappen, 2000]Representing q(XH) by exp{-E’(XH)}:Lower bounding likelihood5Eric Xing 9Lemma: Every marginal distribution q(XH) defines a lower bound of likelihood:where xEdenotes observed variables (evidence).,)(log)(),()()(qqHHHqEHEHECqqdECpH+−=−−≥∫xxxxXxXRepresenting q(XH) by exp{-E’(XH)}:,qqHEC +−=entropy : energy expected :qqHEenergy free Gibbs : qqHE −Lower bounding likelihoodEric Xing 10z Kullback-Leibler Distance:z “Boltzmann’s Law” (definition of “energy”):∑≡zzpzqzqpqKL)()(ln)()||(KL and variational (Gibbs) free energy[])(exp)( zECzp −=1∑∑++≡zzCzqzqzEzqpqKL ln)(ln)()()()||(Gibbs Free Energy ; minimized when)()( ZpZq=)(qG6Eric Xing 11KL and Log Likelihoodz Jensen’s inequalityz KL and Lower bound of likelihoodz Setting q()=p(z|x) closes the gap (c.f. EM) ∑∑∑≥===zzzxzqzxpxzqxzqzxpxzqzxpxpx)|()|,(log)|()|()|,()|(log)|,(log)|(log);(θθθθθl)(),;();( qHzxxqqcLll=+≥⇒θθ∑∑∑∑+=====zzzzxzpzqzqzqzxpzqxzpzqzqzxpzqxzpzxpzqxzpzxpxpx),|()(log)()()|,(log)(),|()()()|,(log)(),|()|,(log)(),|()|,(log)|(log);(θθθθθθθθθθl)||()();( pqKLqx+=⇒Llθln ( )pDln ( )pDL()qL( )qKL( || )qpKL( || )qpEric Xing 12{}{}qqQqqqQqHEHEq−=+−=∈∈ minarg maxargDifficulty: Hqis intractable for general q“solution”: approximate Hqand/or, relax or tighten Qwhere Q is the equivalent sets of realizable distributions, e.g., all valid parameterizations of exponential family distributions, marginal polytopes[winright et al. 2003].A variational representation of probability distributions7Eric Xing 13z But we do not optimize q(X) explicitly, focus on the set of beliefsz e.g.,z Relax the optimization problemz approximate objective:z relaxed feasible set:z The loopy BP algorithm: z a fixed point iteration procedure that tries to solve b*{})( minarg*bFEbbbo−=∈M)}( ),,({, iijijixbxxbbττ===)(bFHq≈oMM →)( MM ⊇o) ,(,ijiBethabbHH={}∑∑==≥=iixjjixioxxxx)(),(,)(|ττττ10MBethe Free Energy/LBPEric Xing 14z Optimize q(XH) in the space of tractable familiesz i.e., subgraph of Gpover which exact computation of Hqis feasiblez Tightening the optimization spacez exact objective:z tightened feasible set: qHTQ →qqqHEq −=∈ minarg*T)( QT ⊆Mean field methods8Eric Xing 15Mean Field ApproximationMean Field ApproximationEric Xing 16)]([ XpG)}]([{ccXqGExact:Clusters:(intractable)Cluster-based approx. to the Gibbs free energy(Wiegerinck 2001, Xing et al 03,04)9Eric Xing 17Mean field approx. to Gibbs free energyz Given a disjoint clustering, {C1, … , CI}, of all variablesz Let z Mean-field free energyz Will never equal to the exact Gibbs free energy no matter what clustering is used, but it does always define a lower bound of the likelihood z Optimize each qi(xc)'s. z Variational calculus …z Do inference in each qi(xc) using any tractable algorithm),()(iiiqqCXX∏=()()()∑∑∑∑∏+=iCiCiiCiCiiCiiiiCiqqEqGxxxxxx ln)(MF()()()()()∑∑∑∑∑∑++=< ixiiiixijijixxjiiijixqxqxxqxxxqxqG ln)()( e.g.,MFφφ(naïve mean field)Eric Xing 18),|()(,,,,*ijiiiiqMBHCECHCHipq≠= XxXXTheorem: The optimum GMF approximation to the cluster marginal is isomorphic to the cluster posterior of the original distribution given internal evidence and its generalized mean fields:GMF algorithm: Iterate over each qiThe Generalized Mean Field theorem10Eric Xing 19[xing et al. UAI 2003]A generalized mean field algorithmEric Xing 20[xing et al. UAI 2003]A generalized mean field algorithm11Eric Xing 21Theorem: The GMF algorithm is guaranteed to converge to a local optimum, and provides a lower bound for the likelihood of evidence (or partition function) the model.Convergence theoremEric Xing 22Gibbs predictive distribution:⎪⎭⎪⎬⎫⎪⎩⎪⎨⎧++=∑∈−ijijiijiiiiAxXXxXpNθθ0exp)|(}):{|(ijijxXpN∈=jxjxmean field equation:}):{|(exp)(ijijiqjiijiiiijXXpAXXXXqjqijNN∈〉〈=⎪⎭⎪⎬⎫⎪⎩⎪⎨⎧++=∑∈θθ0jqjX}:{ijjXjqN∈〉〈Xiz Approximate p(X) by fully factorized q(X)=Piqi(Xi)z For Boltzmann distribution p(X)=exp{∑i < jqijXiXj+qioXi}/Z :Xi ℑxj〉qjresembles a “message” sent from node j to i {〈xj〉qj: j ∈ Ni} forms the “mean field” applied toXifrom its neighborhood}:{iqjjXjN∈〉〈jqjXThe naive mean field approximation12Eric Xing 23Cluster marginal of a square block Ck:⎪⎭⎪⎬⎫⎪⎩⎪⎨⎧++∝∑∑∑∈∈∈∈∈kkMBCkkMBjkCikCkkCjiXqjiijCiiijiijCXXXXXXq,)(',,'exp)(θθθ0Virtually a reparameterized Ising model of small size.Generalized MF approximation to Ising modelsEric Xing 24GMF approximation to IsingmodelsGMF2x2GMF4x4BPAttractive coupling: positively weightedRepulsive coupling: negatively weighted13Eric Xing
View Full Document