11Loopy Belief PropagationGeneralized Belief PropagationUnifying Variationaland GBPLearning Parameters of MNsGraphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityNovember 10th, 2006Readings:K&F: 11.3, 11.5Yedidia et al. paper from the class websiteChapter 9 - Jordan10-708 –Carlos Guestrin 20062More details on Loopy BP Numerical problem: messages < 1 get multiplied togetheras we go around the loops numbers can go to zero normalize messages to one: Zi→jdoesn’t depend on Xj, so doesn’t change the answer Computing node “beliefs” (estimates of probs.): DifficultySATGradeHappyJobCoherenceLetterIntelligence210-708 –Carlos Guestrin 20063An example of running loopy BP10-708 –Carlos Guestrin 20064(Non-)Convergence of Loopy BP Loopy BP can oscillate!!! oscillations can small oscillations can be really bad! Typically, if factors are closer to uniform, loopy does well (converges) if factors are closer to deterministic, loopy doesn’t behave well One approach to help: damping messages new message is average of old message and new one: often better convergence but, when damping is required to get convergence, result often badgraphs from Murphy et al. ’99310-708 –Carlos Guestrin 20065Loopy BP in Factor graphs What if we don’t have pairwiseMarkov nets?1. Transform to a pairwise MN2. Use Loopy BP on a factor graph Message example: from node to factor: from factor to node:A B C D EABC ABD BDE CDE10-708 –Carlos Guestrin 20066Loopy BP in Factor graphs From node i to factor j: F(i) factors whose scope includes Xi From factor j to node i: Scope[φj] = Y∪∪∪∪{Xi}A B C D EABC ABD BDE CDE410-708 –Carlos Guestrin 20067What you need to know about loopy BP Application of belief propagation in loopy graphs Doesn’t always converge damping can help good message schedules can help (see book) If converges, often to incorrect, but useful results Generalizes from pairwise Markov networks by using factor graphs10-708 –Carlos Guestrin 20068Announcements Monday’s special recitation Pradeep Ravikumar on exciting new approximate inference algorithms510-708 –Carlos Guestrin 20069Loopy BP v. Clique trees: Two ends of a spectrumDifficultySATGradeHappyJobCoherenceLetterIntelligenceDIGGJSLHGJCDGSI10-708 –Carlos Guestrin 200610Generalize cluster graph Generalized cluster graph: For set of factors F Undirected graph Each node i associated with a cluster Ci Family preserving: for each factor fj∈ F, ∃node i such that scope[fi]⊆ Ci Each edge i – j is associated with a set of variables Sij⊆ Ci∩ Cj610-708 –Carlos Guestrin 200611Running intersection property (Generalized) Running intersection property (RIP) Cluster graph satisfies RIP if whenever X∈ Ciand X∈ Cjthen ∃ one and only one path from Cito Cjwhere X∈Suvfor every edge (u,v) in the path10-708 –Carlos Guestrin 200612Examples of cluster graphs710-708 –Carlos Guestrin 200613Two cluster graph satisfying RIP with different edge sets10-708 –Carlos Guestrin 200614Generalized BP on cluster graphs satisfying RIP Initialization: Assign each factor φ to a clique α(φ), Scope[φ]⊆Cαααα(φ) Initialize cliques: Initialize messages: While not converged, send messages: Belief:810-708 –Carlos Guestrin 200615Cluster graph for Loopy BPDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –Carlos Guestrin 200616What if the cluster graph doesn’t satisfy RIP910-708 –Carlos Guestrin 200617Region graphs to the rescue Can address generalized cluster graphs that don’t satisfy RIP using region graphs: Yedidia et al. from class website Example in your homework! ☺ Hint – From Yedidia et al.: Section 7 – defines region graphs Section 9 – message passing on region graphs Section 10 – An example that will help you a lot!!! ☺10-708 –Carlos Guestrin 200618Revisiting Mean-Fields Choice of Q: Optimization problem:1010-708 –Carlos Guestrin 200619Interpretation of energy functional Energy functional: Exact if P=Q: View problem as an approximation of entropy term:10-708 –Carlos Guestrin 200620Entropy of a tree distribution Entropy term: Joint distribution: Decomposing entropy term: More generally: dinumber neighbors of XiDifficultySATGradeHappyJobCoherenceLetterIntelligence1110-708 –Carlos Guestrin 200621Loopy BP & Bethe approximation Energy functional: Bethe approximation of Free Energy: use entropy for trees, but loopy graphs: Theorem: If Loopy BP converges, resulting πij& πiare stationary point (usually local maxima) of Bethe Free energy! DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –Carlos Guestrin 200622GBP & Kikuchi approximation Exact Free energy: Junction Tree Bethe Free energy: Kikuchi approximation: Generalized cluster graph spectrum from Bethe to exact entropy terms weighted by counting numbers see Yedidia et al. Theorem: If GBP converges, resulting πCiare stationary point (usually local maxima) of Kikuchi Free energy! DifficultySATGradeHappyJobCoherenceLetterIntelligenceDIGGJSLHGJCDGSI1210-708 –Carlos Guestrin 200623What you need to know about GBP Spectrum between Loopy BP & Junction Trees: More computation, but typically better answers If satisfies RIP, equations are very simple General setting, slightly trickier equations, but not hard Relates to variational methods: Corresponds to local optima of approximate version of energy functional 10-708 –Carlos Guestrin 200624Learning Parameters of a BN Log likelihood decomposes: Learn each CPT independently Use countsDSGHJCLI1310-708 –Carlos Guestrin 200625Log Likelihood for MN Log likelihood of the data:DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –Carlos Guestrin 200626Log Likelihood doesn’t decompose for MNs Log likelihood: A convex problem Can find global optimum!! Term log Z doesn’t decompose!!DifficultySATGradeHappyJobCoherenceLetterIntelligence1410-708 –Carlos Guestrin 200627Derivative of Log Likelihood for MNsDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –Carlos Guestrin 200628Derivative of Log Likelihood for MNsDifficultySATGradeHappyJobCoherenceLetterIntelligence Derivative: Setting derivative to zero Can
View Full Document