CMU CS 10708 - Mean Field and Variational Methods finishing off - D1308550

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10708> Mean Field and Variational Methods finishing off

DOC PREVIEW

CMU CS 10708 - Mean Field and Variational Methods finishing off

School name Carnegie Mellon University

Course Cs 10708- Probabilistic Graphical Models

Pages 24

This preview shows page 1-2-23-24 out of 24 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Mean Field and Variational Methods finishing offSlide 2What you need to know so farReverse KL & The Partition Function Back to the general caseUnderstanding Reverse KL, Energy Function & The Partition FunctionStructured Variational Approximate InferenceOptimization for mean fieldUnderstanding fixed point equationQi only needs to consider factors that intersect XiThere are many stationary points!Very simple approach for finding one stationary pointMore general structured approximationsWhat you need to know about variational methodsLoopy Belief PropagationRecall message passing over junction treesBelief Propagation on Tree Pairwise Markov NetsLoopy Belief Propagation on Pairwise Markov NetsMore details on Loopy BPAn example of running loopy BPConvergence(Non-)Convergence of Loopy BPLoopy BP in Factor graphsSlide 26What you need to know about loopy BP1Mean Field and Variational Methodsfinishing offGraphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityNovember 5th, 2008Readings:K&F: 10.1, 10.510-708 – Carlos Guestrin 2006-200810-708 – Carlos Guestrin 2006-2008210-708 – Carlos Guestrin 2006-20083What you need to know so farGoal:Find an efficient distribution that is close to posteriorDistance:measure distance in terms of KL divergenceAsymmetry of KL:D(p||q)  D(q||p)Computing right KL is intractable, so we use the reverse KL10-708 – Carlos Guestrin 2006-20084Reverse KL & The Partition FunctionBack to the general caseConsider again the defn. of D(q||p):p is Markov net PFTheorem: where energy functional:DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-20085Understanding Reverse KL, Energy Function & The Partition FunctionMaximizing Energy Functional  Minimizing Reverse KLTheorem: Energy Function is lower bound on partition functionMaximizing energy functional corresponds to search for tight lower bound on partition function10-708 – Carlos Guestrin 2006-20086Structured Variational Approximate InferencePick a family of distributions Q that allow for exact inferencee.g., fully factorized (mean field)Find Q2Q that maximizes For mean field10-708 – Carlos Guestrin 2006-20087Optimization for mean fieldConstrained optimization, solved via Lagrangian multiplier 9 , such that optimization equivalent to:Take derivative, set to zeroTheorem: Q is a stationary point of mean field approximation iff for each i:10-708 – Carlos Guestrin 2006-20088Understanding fixed point equationDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200810Theorem: The fixed point:is equivalent to:where the Scope[j] = Uj [ {Xi}Qi only needs to consider factors that intersect XiDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200811There are many stationary points!10-708 – Carlos Guestrin 2006-200812Initialize Q (e.g., randomly or smartly)Set all vars to unprocessedPick unprocessed var Xiupdate Qi:set var i as processedif Qi changedset neighbors of Xi to unprocessedGuaranteed to convergeVery simple approach for finding one stationary pointDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200813More general structured approximations Mean field very naïve approximationConsider more general form for Qassumption: exact inference doable over QTheorem: stationary point of energy functional:Very similar update ruleDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200816What you need to know about variational methodsStructured Variational method:select a form for approximate distributionminimize reverse KL Equivalent to maximizing energy functionalsearching for a tight lower bound on the partition functionMany possible models for Q:independent (mean field)structured as a Markov netcluster variationalSeveral subtleties outlined in the book17Loopy Belief PropagationGraphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityNovember 5th, 2008Readings:K&F: 10.2, 10.310-708 – Carlos Guestrin 2006-200810-708 – Carlos Guestrin 2006-200818Recall message passing over junction treesExact inference:generate a junction treemessage passing over neighborsinference exponential in size of cliqueDifficultySATGradeHappyJobCoherenceLetterIntelligenceDIGGJSLHGJCDGSI10-708 – Carlos Guestrin 2006-200819Belief Propagation on Tree Pairwise Markov NetsTree pairwise Markov net is a tree!!! no need to create a junction treeMessage passing:More general equation:N(i) – neighbors of i in pairwise MNTheorem: Converges to true probabilities:DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200820Loopy Belief Propagation on Pairwise Markov NetsWhat if we apply BP in a graph with loops?send messages between pairs of nodes in graph, and hope for the bestWhat happens?evidence goes around the loops multiple timesmay not convergeif it converges, usually overconfident about probability valuesBut often gives you reasonable, or at least useful answersespecially if you just care about the MPE rather than the actual probabilitiesDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200821More details on Loopy BPNumerical problem:messages < 1 get multiplied togetheras we go around the loopsnumbers can go to zeronormalize messages to one:Zi!j doesn’t depend on Xj, so doesn’t change the answerComputing node “beliefs” (estimates of probs.): DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200822An example of running loopy BP10-708 – Carlos Guestrin 2006-200823ConvergenceIf you tried to send all messages, and beliefs haven’t changed (by much) ! convergedDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200824(Non-)Convergence of Loopy BPLoopy BP can oscillate!!!oscillations can smalloscillations can be really bad!Typically, if factors are closer to uniform, loopy does well (converges)if factors are closer to deterministic, loopy doesn’t behave well One approach to help: damping messagesnew message is average of old message and new one: often better convergencebut, when damping is required to

View Full Document