1Probabilistic Graphical Models10-708VariationalVariationalInference Inference Eric Xing Eric Xing Lecture 18, Nov 14, 2005Reading: KF-Chap. 9z For a distribution p(X|θ) associated with a complex graph, computing the marginal (or conditional) probability of arbitraryrandom variable(s) is intractablez Variational methodsz formulating probabilistic inference as an optimization problem:{})( maxarg*fffFS∈=queries ticprobabiliscertain tosolutions or,ondistributiy probabilit )(tractable a:fe.g.Variational Methods248-2 -1 0 1 2))(exp()exp( 1+−≥µµxx)exp(x())()()()exp()exp( 1636123+−+−+−≥µµµµ xxxx)exp(xLower bounds of exponential functionsz Exponential representation of graphical models:z Includes discrete models, Gaussian, Poisson, exponential, and many others⎭⎬⎫⎩⎨⎧−=∑ααααφθ)()(exp)|(θθ ApDXX⇒−=∑xXXααααφθ )()( state of theas toreferred is energyDE{})()(exp)|(θθAEp−−=XX{})(),(expEEHAExxXθ,−−=Exponential Family3⎪⎭⎪⎬⎫⎪⎩⎪⎨⎧+=<∑∑exp)(jiiiijiijXXXZXp01θθExample: the Boltzmanndistribution on atomic latticeLemma: Every marginal distribution q(XH) defines a lower bound of likelihood:where xEdenotes observed variables (evidence).{}()(), )(),()( )(exp)(HEHEHHEEEAEdpxxxxxxx′−−−′−≥∫1Upgradeable to higher order bound Upgradeable to higher order bound [Leisink and Kappen, 2000]Representing q(XH) by exp{-E’(XH)}:Lower bounding likelihood4Lemma: Every marginal distribution q(XH) defines a lower bound of likelihood:where xEdenotes observed variables (evidence).,)(log)(),()()(qqHHHqEHEHECqqdECpH+−=+−≥∫xxxxXxXRepresenting q(XH) by exp{-E’(XH)}:,qqHEC−−=entropy : energy expected :qqHEenergy free Gibbs : qqHE+Lower bounding likelihoodz Kullback-Leibler Distance:z “Boltzmann’s Law” (definition of “energy”):∑≡zzpzqzqpq)()(ln)()||(KLKL and variational (Gibbs) free energy[])(exp)(zECzp−=1∑∑++≡zzCzqzqzEzqpqln)(ln)()()()||(KLGibbs Free Energy ; minimized when)()(ZpZq=)(qG5KL and Log Likelihoodz Jensen’s inequalityz KL and Lower bound of likelihoodz Setting q()=q(z|x) closes the gap (c.f. EM) ∑∑∑≥===zzzxzqzxpxzqxzqzxpxzqzxpxpx)|()|,(log)|()|()|,()|(log)|,(log)|(log);(θθθθθl)(),;();( qHzxxqqcLll=+≥⇒θθ∑∑∑∑+=====zzzzxzpzqzqzqzxpzqxzpzqzqzxpzqxzpzxpzqxzpzxpxpx),|()(log)()()|,(log)(),|()()()|,(log)(),|()|,(log)(),|()|,(log)|(log);(θθθθθθθθθθl)||()();( pqqxKL+=⇒Llθln ( )pDln ( )pDL()qL( )qKL( || )qpKL( || )qp{}{}qqQqqqQqHEHEq+=−−=∈∈ minarg maxargDifficulty: Hqis intractable for general q“solution”: approximate Hqand/or, relax or tighten Qwhere Q is the equivalent sets of realizable distributions, e.g., all valid parameterizations of exponential family distributions, marginal polytopes[winright et al. 2003].A variational representation of probability distributions6z Optimize q(XH) in the space of tractable familiesz i.e., subgraph of Gpover which exact computation of Hqis feasiblez Tightening the optimization spacez exact objective:z tightened feasible set: qHTQ →qqqHEq +=∈ minarg*T)( QT ⊆Mean field methodsz Do not optimize q(XH) explicitly, but focus on the set of beliefsz e.g.,z Relax the optimization problemz approximate objective:z relaxed feasible set:z The loopy BP algorithm: z a fixed point iteration procedure that tries to solve b*{})( minarg*bFEbbbo+=∈M)}( ),,({,iijijixbxxbbττ===)(bFHq≈oMM →)( MM ⊇o) ,(,ijiBethabbHH={}∑∑==≥=iixjjixioxxxx)(),(,)(|ττττ10MBelief Propagation7Loopy Belief Propagation Loopy Belief Propagation )]([XqG)}]([{rrXbGExact:Regions:(intractable)(Kikuchi, 1951)Region-based Approximations to the Gibbs Free Energy8Bethe Approximation to Gibbs Free Energyz Bethe free energyz Equal to the exact Gibbs free energy when the factor graph is a tree because in that case,z But otherwise, it may or may not give a lower bound of the likelihoodz Optimize each b(xa)'s. z For discrete belief, constrained opt. with Lagrangean multiplier z For continuous belief, not yet a general formulaz Not always converge()()()()()∑∑∑∑−+=iaiiiiiiaaaaaBethebbdfbGxxxxxx lnln 1()()idiiiaaaxbbb−∏∏=1xx)(() ( ) ()∑∑∑∑∑∑⎭⎬⎫⎩⎨⎧−+−+=∈axXiiaaaNixiaixiiiiBetheiaiixbXbxxbGL\)( })({λγ10)(=∂∂iixbL⎟⎟⎠⎞⎜⎜⎝⎛−∝∑∈ )()(11exp)(iNaiaiiiixdxbλ0)(=∂∂aaXbL⎟⎟⎠⎞⎜⎜⎝⎛+−∝∑∈ )()()(exp)(aNiiaiaaaaxXEXbλMinimizing the Bethe Free Energy9∏≠∈→=aiNbiibiaixmx)()(ln)(λ∏∈→∝)()()(iNaiiaiixmxb∏∏∈∈→∝)(\)()()()(aNiaiNbiibaaaaxmXfXbiBethe = BPz Identifyz to obtain BP equations:z Belief in a region is the product of:z Local information (factors in region)z Messages from parent regionsz Messages into descendant regions from parents who are not descendants.z Message-update rules obtained by enforcing marginalization constraints.z Kikuchi free energyGeneralized Belief Propagation10582356 4578 568925 455612455123456789Generalized Belief Propagation582356 4578 568925 455612455585654525 →→→→∝mmmmb123456789Generalized Belief Propagation11582356 4578 568925 455612455]][[585652457845124545 →→→→→∝mmmmmfb123456789Generalized Belief Propagation582356 4578 568925 455612455123456789][452514121245ffffb∝][585645782536 →→→→mmmmGeneralized Belief Propagation12Mean Field ApproximationMean Field Approximation)]([XqG)}]([{ccXqGExact:Clusters:(intractable)Cluster-based approx. to the Gibbs free energy(Wiegerinck 2001, Xing et al 03,04)13Mean field approx. to Gibbs free energyz Given a disjoint clustering, {C1, … , CI}, of all variablesz Let z Mean-field free energyz Will never equal to the exact Gibbs free energy no matter what clustering is used, but it does always define a lower bound of the likelihood z Optimize each qi(xc)'s. z Variational calculus …z Do inference in each qi(xc) using any tractable algorithm),()(iiiqqCXX∏=()()()∑∑∑∑∏+=iCiCiiiCiiCiiiCiqqqGxxxxxx ln)(MFE()()()()()∑∑∑∑∑∑++=<ixiiiixijijixxjiiijixqxqxxqxxxqxqGln)()( e.g.,MFφφ(naïve mean field)),|()(,,,,*ijiiiiqMBHCECHCHipq≠= XxXXTheorem: The optimum GMF approximation to the cluster marginal is isomorphic to the cluster posterior of the original distribution given internal evidence and its generalized mean fields:GMF algorithm: Iterate over each qiThe Generalized Mean Field theorem14Theorem: The GMF algorithm is guaranteed to converge to a local optimum, and provides a lower bound for
View Full Document