CMU CS 10708 - Mean Field and Variational Methods - D491188

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10708> Mean Field and Variational Methods

DOC PREVIEW

CMU CS 10708 - Mean Field and Variational Methods

School name Carnegie Mellon University

Course Cs 10708- Probabilistic Graphical Models

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

11Mean Field and VariationalMethodsFirst approximate inferenceGraphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityNovember 1st, 2006Readings:K&F: 11.1, 11.510-708 –©Carlos Guestrin 20062Approximate inference overview So far: VE & junction trees exact inference exponential in tree-width There are many many many many approximate inference algorithms for PGMs We will focus on three representative ones: sampling variational inference loopy belief propagation and generalized belief propagation There will be a special recitation by PradeepRavikumar on more advanced methods210-708 –©Carlos Guestrin 20063Approximating the posterior v. approximating the prior Prior model represents entire world  world is complicated thus prior model can be very complicated Posterior: after making observations sometimes can become much more sure about the way things are sometimes can be approximated by a simple model First approach to approximate inference: find simple model that is “close” to posterior Fundamental problems: what is close? posterior is intractable result of inference, how can we approximate what we don’t have?DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –©Carlos Guestrin 20064KL divergence: Distance between distributions Given two distributions p and q KL divergence: D(p||q) = 0 iff p=q Not symmetric – p determines where difference is important p(x)=0 and q(x)≠0 p(x)≠0 and q(x)=0310-708 –©Carlos Guestrin 20065Find simple approximate distribution Suppose p is intractable posterior Want to find simple q that approximates p KL divergence not symmetric D(p||q) true distribution p defines support of diff.  the “correct” direction will be intractable to compute D(q||p) approximate distribution defines support tends to give overconfident results will be tractable10-708 –©Carlos Guestrin 20066Back to graphical models Inference in a graphical model: P(x) =  want to compute P(Xi|e) our p: What is the simplest q? every variable is independent: mean field approximation can compute any prob. very efficiently410-708 –©Carlos Guestrin 20067D(p||q) for mean field –KL the right way p: q: D(p||q)=10-708 –©Carlos Guestrin 20068D(q||p) for mean field –KL the reverse direction p: q: D(p||q)=510-708 –©Carlos Guestrin 20069What you need to know so far Goal: Find an efficient distribution that is close to posterior Distance: measure distance in terms of KL divergence Asymmetry of KL: D(p||q) ≠ D(q||p) Computing right KL is intractable, so we use the reverse KL10-708 –©Carlos Guestrin 200610Announcements Tomorrow’s recitation Khalid on Variational Methods Monday’s special recitation Khalid on Dirichlet Processes An exciting way to deal with model selection using graphical models, e.g., selecting (or averaging over) number of clusters in unsupervised learning you even get to see a BN with infinitely many variables610-708 –©Carlos Guestrin 200611Reverse KL & The Partition FunctionBack to the general case Consider again the defn. of D(q||p): p is Markov net PF Theorem:  where energy functional:DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –©Carlos Guestrin 200612Understanding Reverse KL, Energy Function & The Partition Function Maximizing Energy Functional ⇔ Minimizing Reverse KL Theorem: Energy Function is lower bound on partition function Maximizing energy functional corresponds to search for tight lower bound on partition function710-708 –©Carlos Guestrin 200613Structured Variational Approximate Inference Pick a family of distributions Q that allow for exact inference e.g., fully factorized (mean field) Find Q∈Q that maximizes  For mean field10-708 –©Carlos Guestrin 200614Optimization for mean field Constrained optimization, solved via Lagrangian multiplier ∃ λ, such that optimization equivalent to: Take derivative, set to zero Theorem: Q is a stationary point of mean field approximation iff for each i:810-708 –©Carlos Guestrin 200615Understanding fixed point equationDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –©Carlos Guestrin 200616Simplifying fixed point equationDifficultySATGradeHappyJobCoherenceLetterIntelligence910-708 –©Carlos Guestrin 200617 Theorem: The fixed point:is equivalent to: where the Scope[φj] = Uj∪ {Xi}Qionly needs to consider factors that intersect XiDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –©Carlos Guestrin 200618There are many stationary points!1010-708 –©Carlos Guestrin 200619 Initialize Q (e.g., randomly or smartly) Set all vars to unprocessed Pick unprocessed var Xi update Qi: set var i as processed if Qichanged set neighbors of Xito unprocessed Guaranteed to convergeVery simple approach for finding one stationary pointDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –©Carlos Guestrin 200620More general structured approximations  Mean field very naïve approximation Consider more general form for Q assumption: exact inference doable over Q Theorem: stationary point of energy functional:DifficultySATGradeHappyJobCoherenceLetterIntelligence1110-708 –©Carlos Guestrin 200621Computing update rule for general case Consider one φ:DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –©Carlos Guestrin 200622Structured Variational update requires inferece Compute marginals wrt Q of cliques in original graph and cliques in new graph, for all cliques What is a good way of computing all these marginals? Potential updates: sequential: compute marginals, update ψj, recompute marginals parallel: compute marginals, update all ψ’s, recompute marginals1210-708 –©Carlos Guestrin 200623What you need to know about variational methods Structured Variational method: select a form for approximate distribution minimize reverse KL  Equivalent to maximizing energy functional searching for a tight lower bound on the partition function Many possible models for Q: independent (mean field) structured as a Markov net cluster variational Several subtleties outlined in the

View Full Document