11School of Computer ScienceApproximate Inference:Loopy Belief Propagation and variantsProbabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)Lecture 14, Nov 7, 2007HetunandanHetunandanKamisettyKamisettyReceptor AKinase CTF FGene GGene HKinase EKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinase CTF FGene GGene HKinase EKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Reading: KF-Chap. 12, Yedidia et alEric Xing 2An Ising model on 2-D imageNodes encode hidden information (patch-identity).They receive local information from the image (brightness, color).Information is propagated though the graph over its edges.Edges encode ‘compatibility’ between nodes.?air or water ?2Eric Xing 3Why Approximate Inference?Why can’t we just run junction tree on this graph?If NxN grid, tree width atleast NIf N~O(1000), we have a clique with 2100entriesEric Xing 4Undirected graph (Markov random field)Directed graph(Bayesian network)ij)(iixψ),()( jiijxxψiParents(i)factor graphsinteractionsvariablesA recap: Factor Graphs∏ ∏=i ijjiijiixxxZxP)()(),()(1)(ψψ)|()()(parents∏=iiixxPxP∏∈=FfaaaXfZXP )(/1)()(aaXf3Eric Xing 5ikkkkijkkkMki∏∑→→∝kiikxiijiijjjixMxxxxMi)()(),()(ψψCompatibilities (interactions)external evidence?)(iixbBelief PropagationBP Message-update RulesBP on trees always converges to exact marginals (cf. Junction tree algorithm)Eric Xing 6Beliefs and messages in FGi∏∈→∝)()()()(iNaiiaiiiixmxfxb“beliefs” “messages”a∏∏∈→∈→→∝=)(\)()()()()()(aNiiaiaaaaaiNciiciaixmXfXbxmxm?)( =→ iiaxm4Eric Xing 7Approximate Inference: What to approximate?Let us call the actual distribution PWe wish to find a distribution Q such that Q is a “good”approximationRecall the definition of KL-divergenceKL(Q1||Q2)>=0KL(Q1||Q2)=0 iff Q1=Q2We can therefore use KL as a scoring function to decide a good QBut, KL(Q1||Q2) ≠ KL(Q2||Q1)∏∈=FfaaaXfZXP )(/1)(Eric Xing 8Which KL?Computing KL(P||Q) requires inference!But KL(Q||P) can be computed without performing inference on PUsing ))()(log()()||(XPXQXQPQKLX∑=))(/1log()()||(∏∈−−=FfaaQQaXfZEXHPQKL∏∈=FfaaaXfZXP )(/1)(5Eric Xing 9Optimization functionWe will call the “Free energy” *=?F(P,Q) >= F(P,P)ZXfEXHPQKLFfaaQQalog)(log)()||( +−−=∑∈),( QPF),( QPF*Gibbs Free Energy),( PPFEric Xing 10The Free EnergyLet us look at the free energycan be computed if we have marginals over each fais harder! Requires summation over all possible valuesComputing F, is therefore hard in general.Approach 1: Approximate with easy to compute∑∈−−=FfaaQQaXfEXHQPF )(log)(),(∑∈FfaaQaXfE )(log∑=XQXQXQH )(log)(),( QPF∧),( QPF6Eric Xing 11Easy free energiesConsider a tree-structured distributionThe probability can be written as: involves summation over edges and vertices and is therefore easy to compute( ) ( )idiiiaaaxbbb−∏∏=1xx)(()()()()()∑∑∑∑−+−=iaiiiiiiaaaaatreebbdbbHxxxxxx log1log( )()( )( ) ( ) ( )∑∑∑∑−+=iaiiiiiiaaaaaaaTreebbdfbbFxxxxxxx log1log=1X2X3X4X5X6X7X8XEric Xing 12Bethe Approximation to Gibbs Free EnergyFor a general graph, chooseCalled “Bethe approximation” after the physicist Hans BetheEqual to the exact Gibbs free energy when the factor graph is a tree Note: This is not the same as the entropy of a tree( )()( )( ) ( ) ( )∑∑∑∑−+=∧iaiiiiiiaaaaaaabbdfbbQPFxxxxxxx log1log),(1X2X3X4X5X6X7X8X=betheF7Eric Xing 13Bethe ApproximationPros:Easy to compute, since entropy term involves sum over pairwise and single variablesCons:may or may not be well connected toIt could, in general, be greater, equal or less than Optimize each b(xa)'s. For discrete belief, constrained opt. with Lagrangian multiplier For continuous belief, not yet a general formulaNot always convergebetheFQPF =∧),(),( QPF),( QPFEric Xing 14Minimizing the Bethe Free EnergySet derivative to zero( ) ( ) ( )∑ ∑∑ ∑∑∑−+−+=∈a xXaaiiaNi xiaixiiiiBetheiaiiXbxbxxbFL\)( })(1{λγ8Eric Xing 15ProofEric Xing 169Eric Xing 17∏≠∈→→==aiNbiibiaiiaixmxmx)()(log))(log()(λ∏∈→∝)()()(iNaiiaiixmxb∏∈→∝)()()()(aNiiaiaaaaxmXfXbiBethe = BPWe hadIdentifyto obtain BP equations:−∝∑∈ )()(11exp)(iNaiaiiiixdxbλ+−∝∑∈ )()()(logexp)(aNiiaiaaaaxXfXbλEric Xing 18A fixed point iteration procedure that tries to minimize FbetheStart with random initialization of messages and beliefsWhile not converged doAt convergence, stationarity properties are guaranteed. Why?However, not guaranteed to converge!Loopy Belief Propagation∏∈→∝)()()(iNaiiaiixmxb∏∈→∝)()()()(aNiiaiaaaaxmXfXb∏∈→→=aiNciicinewaixmxm\)()()(∑∏∈→→=iaxXiaNjjajaainewiaxmXfxm\\)()()()(10Eric Xing 19Region graphsIt will be useful to look explicitly at the messages being passed Messages from variable to factorsMessages from factors to variablesLet us represent this graphically1X2X3X4X5X6X7X8X26155651221 6Eric Xing 20Summary so far∏∈=FfaaaXfZXP )(/1)(∑∈−−=FfaaQQaXfEXHQPF )(log)(),(( )()( )( ) ( ) ( )∑∑∑∑−+=∧iaiiiiiiaaaaaaabbdbfbQPFxxxxxxx log1log),(−∝∑∈ )()(11exp)(iNaiaiiiixdxbλ+−∝∑∈ )()()(logexp)(aNiiaiaaaaxXfXbλa11Eric Xing 21Better approximations?Recall that Bethe approximation wasWe could construct bigger regionsRules:If a region includes a factor, it must include the vertices as wellEach factor and vertex must appear in atleast one regionAssociate a weight with each region so that each vertex and factor is counted exactly once1X2X3X4X5X6X7X8X8625178672312..22..FFFFFFFFFFbethe−−−−−++++=Eric Xing 22Other regions? 1X2X3X4X5X6X7X8X73623626347823671256^FFFFFFFFFF++++−−++=12Eric Xing 23)]([ XqG)}]([{rrXbGExact:Regions:(intractable)(Kikuchi, 1951)Region-based Approximations to the Gibbs Free EnergyEric Xing 24Belief in a region is the product of:Local information (factors in region)Messages from parent regionsMessages into descendant regions from parents who are not descendants.Message-update rules obtained by enforcing marginalization constraints.Generalized Belief Propagation13Eric Xing 25582356 4578 568925 455612455123456789Generalized Belief Propagation=^FEric
View Full Document