DOC PREVIEW
CMU CS 10708 - Learning Completely Observed Undirected Graphical Models

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Probabilistic Graphical Models10-708Learning Completely Observed Learning Completely Observed Undirected Graphical Models Undirected Graphical Models Eric Xing Eric Xing Lecture 12, Oct 19, 2005Reading: MJ-Chap. 9,19,20Recap: MLE for BNsz If we assume the parameters for each CPD are globally independent, and all nodes are fully observed, then the log-likelihood function decomposes into a sum of local terms, one per node:∑∑∏∏⎟⎠⎞⎜⎝⎛=⎟⎟⎠⎞⎜⎜⎝⎛==iniinniiiniixpxpDpD),|(log),|(log)|(log);(,,θθθθππxxl∑=kjikijijkMLijknn,','θ2MLE for undirected graphical modelsz For directed graphical models, the log-likelihood decomposes into a sum of terms, one per family (node plus parents).z For undirected graphical models, the log-likelihood does not decompose, because the normalization constantZis a function of all the parametersz In general, we will need to do inference (i.e., marginalization)to learn parameters for undirected models, even in the fully observed case.∏∈=CcccnZxxP)(),,( xψ11K∑∏∈=nxxCcccZ,,)(K1xψLog Likelihood for UGMs with tabular clique potentialsz Sufficient statistics: for a UGM (V,E), the number of times that a configuration x (i.e., XV=x) is observed in a dataset D={x1,…,xN} can be represented as follows:z In terms of the counts, the log likelihood is given by:z There is a nasty log Zin the likelihood∑∑==cVmmmcnn\count) (clique )()( and ,count) (total ),()(defdefxxxxxxδZNmZmppDppDpcccccccnnnnncnlog)(log)()(log)()|(log),()|(log),()(log )|()(),(−=⎟⎟⎠⎞⎜⎜⎝⎛====∑∑∑∏∑∑∑∑∏∏xxxxxxxxxxxxxxxxxxψψθδθδθθθδ1l3Derivative of log Likelihoodz Log-likelihood: z First term:z Second term:ZNmccccclog)(log)( −=∑∑xxxψl)()()(cccccmxxxψψ=∂∂1l)()()~(),~()()~()~(),~()~()(),~()~()()(log~~~~cccccccdddccccdddccccdddccccppZZZZxxxxxxxxxxxxxxxxxxxxxψδψψψδψψδψψψ===⎟⎟⎠⎞⎜⎜⎝⎛∂∂=⎟⎟⎠⎞⎜⎜⎝⎛∂∂=∂∂∑∏∑∏∑∑∏11111Set the value of variables to xx~Conditions on Clique Marginalsz Derivative of log-likelihoodz Hence, for the maximum likelihood parameters, we know that:z In other words, at the maximum likelihood setting of the parameters, for each clique, the model marginals must be equal to the observed marginals (empirical counts).z This doesn’t tell us how to get the ML parameters, it just gives us a condition that must be satisfied when we have them.)()()()()(ccccccccpNmxxxxxψψψ−=∂∂l)(~)()(def*cccMLEpNmpxxx ==4MLE for undirected graphical modelsz Is the graph decomposable (triangulated)?z Are all the clique potentials defined on maximal cliques (not sub-cliques)? e.g., ψ123, ψ234not ψ12, ψ23, …z Are the clique potentials full tables (or Gaussians), or parameterized more compactly, e.g. ?X1XX44XX33XX22X1XX44XX33XX22()∑=cckkccf)(exp)( xxθψGradient---GIF---IPF√--Direct√√√MethodTabular?Max clique?Decomposable?MLE for decomposable undirected modelsz Decomposable models:z G is decomposable ⇔ G is triangulated ⇔ G has a junction treez Potential based representation: z Consider a chain X1−X2−X3. The cliques are (X1,X2) and (X2,X3); the separator is X2z The empirical marginals must equal the model marginals.z Let us guess thatz We can verify that such a guess satisfies the conditions:and similarly)(~),(~),(~),,(23221321xpxxpxxpMLExxxp=)),(~),(~)|(~),,(),(2132213212133xxpxxpxxpxxxpxxpxxMLEMLE===∑∑))),(~),(3232xxpxxpMLE=)∏∏=ssscccp)()()(xxxϕψ5MLE for decomposable undirected models (cont.)z Let us guess thatz To compute the clique potentials, just equate them to the empirical marginals (or conditionals), i.e., the separator must be divided into one of its neighbors. Then Z = 1.z One more example:X1XX44XX33XX22),(~),,(~),,(~),,,(324323214321xxpxxxpxxxpxxxxpMLE=)),|(~),(~),,(~),(3213232132123xxxpxxpxxxpxxMLE==ψ)),,(~),,(432432234xxxpxxxMLE=ψ))(~),(~),(~),,(23221321xpxxpxxpMLExxxp=)),(~),(212112xxpxxMLE=ψ))|(~)(~),(~),(322323223xxpxpxxpxxMLE==ψ)Non-decomposable and/or with non-maximal clique potentialsz If the graph is non-decomposable, and or the potentials are defined on non-maximal cliques (e.g., ψ12, ψ34), we could not equate empirical marginals (or conditionals) to MLE of cliques potentials.X1XX44XX33XX22X1XX44XX33XX22∏=},{),(),,,(jijiijxxxxxxpψ4321⎪⎩⎪⎨⎧≠∃)(~/),(~)(~/),(~),(~),( s.t. ),(MLEjjiijijijiijxpxxpxpxxpxxpxxjiψHomework!Homework!6Iterative Proportional Fitting (IPF)z From the derivative of the likelihood:z we can derive another relationship:in which ψcappears implicitly in the model marginal p(xc).z This is therefore a fixed-point equation for ψc. z Solving ψcin closed-form is hard, because it appears on both sides of this implicit nonlinear equation.z The idea of IPF is to hold ψcfixed on the right hand side (both in the numerator and denominator) and solve for it on the left hand side. We cycle through all cliques, then iterate:)()()()()(ccccccccpNmxxxxxψψψ−=∂∂l)()()()(~ccccccppxxxxψψ=)()(~)()()()()(ctcctcctcppxxxxψψ=+1Need to do inference hereNeed to do inference hereProperties of IPF Updatesz IPF iterates a set of fixed-point equations.z However, we can prove it is also a coordinate ascent algorithm (coordinates = parameters of clique potentials).z Hence at each step, it will increase the log-likelihood, and it will converge to a global maximum.z I-projection: finding a distribution with the correct marginals that has the maximal entropy7KL Divergence Viewz IPF can be seen as coordinate ascent in the likelihood using the way of expressing likelihoods using KL divergences.z Recall that we have shown maximizing the log likelihood is equivalent to minimizing the KL divergence (cross entropy) from the observed distribution to the model distribution:z Using a property of KL divergence based on the conditional chain rule: p(x) = p(xa)p(xb|xa):()∑=⇔xxpxpxpxpxp)|()(~log)(~)|(||)(~minmaxθθKLl()()( )∑∑∑∑+=+==abababaxababaaaxxabababaxxaaabaxxabaabaabababaxxpxxqxqxpxqxxpxxqxxqxqxpxqxxqxqxxpxpxxqxqxxqxqxxpxxq)|(||)|()()(||)()|()|(log)|()()()(log)|()()|()()|()(log)|()(),(||),(,,,KLKLKLIPF minimizes KL divergencez Putting things together, we haveIt can be shown that changing the clique potential ψchas no effect on the conditional distribution, so the second term in unaffected. z To minimize the first term, we set the marginal to the observed marginal, just as in IPF.z We can interpret IPF updates as retaining


View Full Document

CMU CS 10708 - Learning Completely Observed Undirected Graphical Models

Documents in this Course
Lecture

Lecture

15 pages

Lecture

Lecture

25 pages

Lecture

Lecture

24 pages

causality

causality

53 pages

lecture11

lecture11

16 pages

Exam

Exam

15 pages

Notes

Notes

12 pages

lecture

lecture

18 pages

lecture

lecture

16 pages

Lecture

Lecture

17 pages

Lecture

Lecture

15 pages

Lecture

Lecture

17 pages

Lecture

Lecture

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

16 pages

r6

r6

22 pages

lecture

lecture

20 pages

lecture

lecture

35 pages

Lecture

Lecture

19 pages

Lecture

Lecture

21 pages

lecture

lecture

21 pages

lecture

lecture

13 pages

review

review

50 pages

Semantics

Semantics

30 pages

lecture21

lecture21

26 pages

MN-crf

MN-crf

20 pages

hw4

hw4

5 pages

lecture

lecture

12 pages

Lecture

Lecture

25 pages

Lecture

Lecture

25 pages

Lecture

Lecture

14 pages

Lecture

Lecture

15 pages

Load more
Download Learning Completely Observed Undirected Graphical Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning Completely Observed Undirected Graphical Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning Completely Observed Undirected Graphical Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?