DOC PREVIEW
CMU CS 10708 - variational

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Mean Field and Variational Methods First approximate inferenceApproximate inference overviewApproximating the posterior v. approximating the priorKL divergence: Distance between distributionsFind simple approximate distributionBack to graphical modelsD(p||q) for mean field – KL the right wayD(q||p) for mean field – KL the reverse directionWhat you need to know so farAnnouncementsReverse KL & The Partition Function Back to the general caseUnderstanding Reverse KL, Energy Function & The Partition FunctionStructured Variational Approximate InferenceOptimization for mean fieldUnderstanding fixed point equationSimplifying fixed point equationQi only needs to consider factors that intersect XiThere are many stationary points!Very simple approach for finding one stationary pointMore general structured approximationsComputing update rule for general caseStructured Variational update requires infereceWhat you need to know about variational methods1Mean Field and Variational MethodsFirst approximate inferenceGraphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityNovember 1st, 2006Readings:K&F: 11.1, 11.510-708 – Carlos Guestrin 20062 Approximate inference overviewSo far: VE & junction treesexact inferenceexponential in tree-widthThere are many many many many approximate inference algorithms for PGMsWe will focus on three representative ones:samplingvariational inferenceloopy belief propagation and generalized belief propagationThere will be a special recitation by Pradeep Ravikumar on more advanced methods10-708 – Carlos Guestrin 20063 Approximating the posterior v. approximating the priorPrior model represents entire world world is complicatedthus prior model can be very complicatedPosterior: after making observationssometimes can become much more sure about the way things aresometimes can be approximated by a simple modelFirst approach to approximate inference: find simple model that is “close” to posteriorFundamental problems:what is close?posterior is intractable result of inference, how can we approximate what we don’t have?DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 20064 KL divergence: Distance between distributionsGiven two distributions p and q KL divergence:D(p||q) = 0 iff p=qNot symmetric – p determines where difference is importantp(x)=0 and q(x)0p(x)0 and q(x)=010-708 –  Carlos Guestrin 20065 Find simple approximate distributionSuppose p is intractable posteriorWant to find simple q that approximates p KL divergence not symmetricD(p||q)true distribution p defines support of diff. the “correct” directionwill be intractable to computeD(q||p)approximate distribution defines supporttends to give overconfident resultswill be tractable10-708 – Carlos Guestrin 20066 Back to graphical modelsInference in a graphical model:P(x) = want to compute P(Xi|e)our p:What is the simplest q?every variable is independent:mean field approximationcan compute any prob. very efficiently10-708 – Carlos Guestrin 20067 D(p||q) for mean field – KL the right wayp:q:D(p||q)=10-708 – Carlos Guestrin 20068 D(q||p) for mean field – KL the reverse directionp:q:D(p||q)=10-708 – Carlos Guestrin 20069 What you need to know so farGoal:Find an efficient distribution that is close to posteriorDistance:measure distance in terms of KL divergenceAsymmetry of KL:D(p||q)  D(q||p)Computing right KL is intractable, so we use the reverse KL10-708 – Carlos Guestrin 200610 AnnouncementsTomorrow’s recitationKhalid on Variational MethodsMonday’s special recitationKhalid on Dirichlet ProcessesAn exciting way to deal with model selection using graphical models, e.g., selecting (or averaging over) number of clusters in unsupervised learningyou even get to see a BN with infinitely many variables10-708 – Carlos Guestrin 200611 Reverse KL & The Partition FunctionBack to the general caseConsider again the defn. of D(q||p):p is Markov net PFTheorem: where energy functional:DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 200612 Understanding Reverse KL, Energy Function & The Partition FunctionMaximizing Energy Functional  Minimizing Reverse KLTheorem: Energy Function is lower bound on partition functionMaximizing energy functional corresponds to search for tight lower bound on partition function10-708 – Carlos Guestrin 200613 Structured Variational Approximate InferencePick a family of distributions Q that allow for exact inferencee.g., fully factorized (mean field)Find Q2Q that maximizes For mean field10-708 –  Carlos Guestrin 200614 Optimization for mean fieldConstrained optimization, solved via Lagrangian multiplier 9 , such that optimization equivalent to:Take derivative, set to zeroTheorem: Q is a stationary point of mean field approximation iff for each i:10-708 – Carlos Guestrin 200615 Understanding fixed point equationDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 200616 Simplifying fixed point equationDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 200617 Theorem: The fixed point:is equivalent to:where the Scope[j] = Uj [ {Xi}Qi only needs to consider factors that intersect XiDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 200618 There are many stationary points!10-708 – Carlos Guestrin 200619 Initialize Q (e.g., randomly or smartly)Set all vars to unprocessedPick unprocessed var Xiupdate Qi:set var i as processedif Qi changedset neighbors of Xi to unprocessedGuaranteed to convergeVery simple approach for finding one stationary pointDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –  Carlos Guestrin 200620 More general structured approximations Mean field very naïve approximationConsider more general form for Qassumption: exact inference doable over QTheorem: stationary point of energy functional:DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 200621 Computing update rule for general caseConsider one :DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 200622 Structured


View Full Document

CMU CS 10708 - variational

Documents in this Course
Lecture

Lecture

15 pages

Lecture

Lecture

25 pages

Lecture

Lecture

24 pages

causality

causality

53 pages

lecture11

lecture11

16 pages

Exam

Exam

15 pages

Notes

Notes

12 pages

lecture

lecture

18 pages

lecture

lecture

16 pages

Lecture

Lecture

17 pages

Lecture

Lecture

15 pages

Lecture

Lecture

17 pages

Lecture

Lecture

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

16 pages

r6

r6

22 pages

lecture

lecture

20 pages

lecture

lecture

35 pages

Lecture

Lecture

19 pages

Lecture

Lecture

21 pages

lecture

lecture

21 pages

lecture

lecture

13 pages

review

review

50 pages

Semantics

Semantics

30 pages

lecture21

lecture21

26 pages

MN-crf

MN-crf

20 pages

hw4

hw4

5 pages

lecture

lecture

12 pages

Lecture

Lecture

25 pages

Lecture

Lecture

25 pages

Lecture

Lecture

14 pages

Lecture

Lecture

15 pages

Load more
Download variational
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view variational and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view variational 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?