DOC PREVIEW
CMU CS 10708 - Unifying Variational and GBP Learning Parameters of MNs EM for BNs

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11Unifying Variationaland GBPLearning Parameters of MNsEM for BNsGraphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityNovember 15th, 2006Readings:K&F: 11.3, 11.5Yedidia et al. paper from the class websiteChapter 9, 20 – JordanK&F: 16.210-708 –Carlos Guestrin 20062Revisiting Mean-Fields Choice of Q: Optimization problem:210-708 –Carlos Guestrin 20063Interpretation of energy functional Energy functional: Exact if P=Q: View problem as an approximation of entropy term:10-708 –Carlos Guestrin 20064Entropy of a tree distribution Entropy term: Joint distribution: Decomposing entropy term: More generally:  dinumber neighbors of XiDifficultySATGradeHappyJobCoherenceLetterIntelligence310-708 –Carlos Guestrin 20065Loopy BP & Bethe approximation Energy functional: Bethe approximation of Free Energy: use entropy for trees, but loopy graphs: Theorem: If Loopy BP converges, resulting πij& πiare stationary point (usually local maxima) of Bethe Free energy! DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –Carlos Guestrin 20066GBP & Kikuchi approximation Exact Free energy: Junction Tree Bethe Free energy: Kikuchi approximation: Generalized cluster graph  spectrum from Bethe to exact entropy terms weighted by counting numbers see Yedidia et al. Theorem: If GBP converges, resulting πCiare stationary point (usually local maxima) of Kikuchi Free energy! DifficultySATGradeHappyJobCoherenceLetterIntelligenceDIGGJSLHGJCDGSI410-708 –Carlos Guestrin 20067What you need to know about GBP Spectrum between Loopy BP & Junction Trees: More computation, but typically better answers If satisfies RIP, equations are very simple General setting, slightly trickier equations, but not hard Relates to variational methods: Corresponds to local optima of approximate version of energy functional 10-708 –Carlos Guestrin 20068Announcements Tomorrow’s recitation Khalid on learning Markov Networks510-708 –Carlos Guestrin 20069Learning Parameters of a BN Log likelihood decomposes: Learn each CPT independently Use countsDSGHJCLI10-708 –Carlos Guestrin 200610Log Likelihood for MN Log likelihood of the data:DifficultySATGradeHappyJobCoherenceLetterIntelligence610-708 –Carlos Guestrin 200611Log Likelihood doesn’t decompose for MNs Log likelihood: A concave problem Can find global optimum!! Term log Z doesn’t decompose!!DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –Carlos Guestrin 200612Derivative of Log Likelihood for MNsDifficultySATGradeHappyJobCoherenceLetterIntelligence710-708 –Carlos Guestrin 200613Derivative of Log Likelihood for MNsDifficultySATGradeHappyJobCoherenceLetterIntelligence Derivative: Setting derivative to zero Can optimize using gradient ascent Let’s look at a simpler solution10-708 –Carlos Guestrin 200614Iterative Proportional Fitting (IPF)DifficultySATGradeHappyJobCoherenceLetterIntelligence Setting derivative to zero: Fixed point equation: Iterate and converge to optimal parameters Each iteration, must compute:810-708 –Carlos Guestrin 200615Log-linear Markov network(most common representation) Feature is some function φ[D] for some subset of variables D e.g., indicator function Log-linear model over a Markov network H: a set of features φ1[D1],…, φk[Dk] each Diis a subset of a clique in H two φ’s can be over the same variables a set of weights w1,…,wk usually learned from data10-708 –Carlos Guestrin 200616Learning params for log linear models 1 –Generalized Iterative Scaling IPF generalizes easily if: φi(x) ≥ 0 ∑iφi(x) = 1 Update rule: Must compute: If conditions violated, equations are not so simple… c.f., Improved Iterative Scaling [Berger ’97]910-708 –Carlos Guestrin 200617Learning params for log linear models 2 –Gradient Ascent  Log-likelihood of data: Compute derivative & optimize usually with conjugate gradient ascent You will do an example in your homework! ☺10-708 –Carlos Guestrin 200618What you need to know about learning MN parameters? BN parameter learning easy MN parameter learning doesn’t decompose! Learning requires inference! Apply gradient ascent or IPF iterations to obtain optimal parameters applies to both tabular representations and log-linear models1010-708 –Carlos Guestrin 200619Thus far, fully supervised learning We have assumed fully supervised learning: Many real problems have missing data:10-708 –Carlos Guestrin 200620The general learning problem with missing data Marginal likelihood – x is observed, z is missing:1110-708 –Carlos Guestrin 200621E-step x is observed, z is missing Compute probability of missing data given current choice of θ Q(z|xj) for each xj e.g., probability computed during classification step corresponds to “classification step” in K-means10-708 –Carlos Guestrin 200622Jensen’s inequality  Theorem: log ∑zP(z) f(z) ≥ ∑zP(z) log f(z)1210-708 –Carlos Guestrin 200623Applying Jensen’s inequality Use: log ∑zP(z) f(z) ≥ ∑zP(z) log f(z) 10-708 –Carlos Guestrin 200624The M-step maximizes lower bound on weighted data Lower bound from Jensen’s: Corresponds to weighted dataset: <x1,z=1> with weight Q(t+1)(z=1|x1) <x1,z=2> with weight Q(t+1)(z=2|x1) <x1,z=3> with weight Q(t+1)(z=3|x1) <x2,z=1> with weight Q(t+1)(z=1|x2) <x2,z=2> with weight Q(t+1)(z=2|x2) <x2,z=3> with weight Q(t+1)(z=3|x2) …1310-708 –Carlos Guestrin 200625The M-step Maximization step: Use expected counts instead of counts: If learning requires Count(x,z) Use EQ(t+1)[Count(x,z)]10-708 –Carlos Guestrin 200626Convergence of EM Define potential function F(θ,Q): EM corresponds to coordinate ascent on F Thus, maximizes lower bound on marginal log likelihood As seen in Machine Learning class last semester1410-708 –Carlos Guestrin 200627Data likelihood for BNs Given structure, log likelihood of fully observed data:FluAllergySinusHeadacheNose10-708 –Carlos Guestrin 200628Marginal likelihood What if S is hidden?FluAllergySinusHeadacheNose1510-708 –Carlos Guestrin 200629Log likelihood for BNswith hidden data Marginal likelihood – O is observed, H is


View Full Document

CMU CS 10708 - Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Documents in this Course
Lecture

Lecture

15 pages

Lecture

Lecture

25 pages

Lecture

Lecture

24 pages

causality

causality

53 pages

lecture11

lecture11

16 pages

Exam

Exam

15 pages

Notes

Notes

12 pages

lecture

lecture

18 pages

lecture

lecture

16 pages

Lecture

Lecture

17 pages

Lecture

Lecture

15 pages

Lecture

Lecture

17 pages

Lecture

Lecture

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

16 pages

r6

r6

22 pages

lecture

lecture

20 pages

lecture

lecture

35 pages

Lecture

Lecture

19 pages

Lecture

Lecture

21 pages

lecture

lecture

21 pages

lecture

lecture

13 pages

review

review

50 pages

Semantics

Semantics

30 pages

lecture21

lecture21

26 pages

MN-crf

MN-crf

20 pages

hw4

hw4

5 pages

lecture

lecture

12 pages

Lecture

Lecture

25 pages

Lecture

Lecture

25 pages

Lecture

Lecture

14 pages

Lecture

Lecture

15 pages

Load more
Download Unifying Variational and GBP Learning Parameters of MNs EM for BNs
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Unifying Variational and GBP Learning Parameters of MNs EM for BNs and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Unifying Variational and GBP Learning Parameters of MNs EM for BNs 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?