Mean Field and Variational Methods finishing offSlide 2What you need to know so farReverse KL & The Partition Function Back to the general caseUnderstanding Reverse KL, Energy Function & The Partition FunctionStructured Variational Approximate InferenceOptimization for mean fieldUnderstanding fixed point equationQi only needs to consider factors that intersect XiThere are many stationary points!Very simple approach for finding one stationary pointMore general structured approximationsWhat you need to know about variational methodsLoopy Belief PropagationRecall message passing over junction treesBelief Propagation on Tree Pairwise Markov NetsLoopy Belief Propagation on Pairwise Markov NetsMore details on Loopy BPAn example of running loopy BPConvergence(Non-)Convergence of Loopy BPLoopy BP in Factor graphsSlide 26What you need to know about loopy BP1Mean Field and Variational Methodsfinishing offGraphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityNovember 5th, 2008Readings:K&F: 10.1, 10.510-708 – Carlos Guestrin 2006-200810-708 – Carlos Guestrin 2006-2008210-708 – Carlos Guestrin 2006-20083What you need to know so farGoal:Find an efficient distribution that is close to posteriorDistance:measure distance in terms of KL divergenceAsymmetry of KL:D(p||q) D(q||p)Computing right KL is intractable, so we use the reverse KL10-708 – Carlos Guestrin 2006-20084Reverse KL & The Partition FunctionBack to the general caseConsider again the defn. of D(q||p):p is Markov net PFTheorem: where energy functional:DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-20085Understanding Reverse KL, Energy Function & The Partition FunctionMaximizing Energy Functional Minimizing Reverse KLTheorem: Energy Function is lower bound on partition functionMaximizing energy functional corresponds to search for tight lower bound on partition function10-708 – Carlos Guestrin 2006-20086Structured Variational Approximate InferencePick a family of distributions Q that allow for exact inferencee.g., fully factorized (mean field)Find Q2Q that maximizes For mean field10-708 – Carlos Guestrin 2006-20087Optimization for mean fieldConstrained optimization, solved via Lagrangian multiplier 9 , such that optimization equivalent to:Take derivative, set to zeroTheorem: Q is a stationary point of mean field approximation iff for each i:10-708 – Carlos Guestrin 2006-20088Understanding fixed point equationDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200810Theorem: The fixed point:is equivalent to:where the Scope[j] = Uj [ {Xi}Qi only needs to consider factors that intersect XiDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200811There are many stationary points!10-708 – Carlos Guestrin 2006-200812Initialize Q (e.g., randomly or smartly)Set all vars to unprocessedPick unprocessed var Xiupdate Qi:set var i as processedif Qi changedset neighbors of Xi to unprocessedGuaranteed to convergeVery simple approach for finding one stationary pointDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200813More general structured approximations Mean field very naïve approximationConsider more general form for Qassumption: exact inference doable over QTheorem: stationary point of energy functional:Very similar update ruleDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200816What you need to know about variational methodsStructured Variational method:select a form for approximate distributionminimize reverse KL Equivalent to maximizing energy functionalsearching for a tight lower bound on the partition functionMany possible models for Q:independent (mean field)structured as a Markov netcluster variationalSeveral subtleties outlined in the book17Loopy Belief PropagationGraphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityNovember 5th, 2008Readings:K&F: 10.2, 10.310-708 – Carlos Guestrin 2006-200810-708 – Carlos Guestrin 2006-200818Recall message passing over junction treesExact inference:generate a junction treemessage passing over neighborsinference exponential in size of cliqueDifficultySATGradeHappyJobCoherenceLetterIntelligenceDIGGJSLHGJCDGSI10-708 – Carlos Guestrin 2006-200819Belief Propagation on Tree Pairwise Markov NetsTree pairwise Markov net is a tree!!! no need to create a junction treeMessage passing:More general equation:N(i) – neighbors of i in pairwise MNTheorem: Converges to true probabilities:DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200820Loopy Belief Propagation on Pairwise Markov NetsWhat if we apply BP in a graph with loops?send messages between pairs of nodes in graph, and hope for the bestWhat happens?evidence goes around the loops multiple timesmay not convergeif it converges, usually overconfident about probability valuesBut often gives you reasonable, or at least useful answersespecially if you just care about the MPE rather than the actual probabilitiesDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200821More details on Loopy BPNumerical problem:messages < 1 get multiplied togetheras we go around the loopsnumbers can go to zeronormalize messages to one:Zi!j doesn’t depend on Xj, so doesn’t change the answerComputing node “beliefs” (estimates of probs.): DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200822An example of running loopy BP10-708 – Carlos Guestrin 2006-200823ConvergenceIf you tried to send all messages, and beliefs haven’t changed (by much) ! convergedDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200824(Non-)Convergence of Loopy BPLoopy BP can oscillate!!!oscillations can smalloscillations can be really bad!Typically, if factors are closer to uniform, loopy does well (converges)if factors are closer to deterministic, loopy doesn’t behave well One approach to help: damping messagesnew message is average of old message and new one: often better convergencebut, when damping is required to
View Full Document