DOC PREVIEW
CMU STA 36402-36608 - Lecture

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Conditional Independence and Factor ModelsDirected Acyclic Graph (DAG) ModelsConditional Independence and the Markov PropertyExamples of DAG Models and Their UsesMissing VariablesFurther ReadingNon-DAG Graphical Models: Undirected Graphs and Directed Graphs with CyclesUndirected GraphsDirected but Cyclic GraphsRudimentary Graph TheoryLecture 21, Graphical Models36-402, Advanced Data Analysis7 April 2011Contents1 Conditional Independence and Factor Models 22 Directed Acyclic Graph (DAG) Models 42.1 Conditional Independence and the Markov Property . . . . . . . 53 Examples of DAG Models and Their Uses 63.1 Missing Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Further Reading 10A Non-DAG Graphical Models: Undirected Graphs and DirectedGraphs with Cycles 11A.1 Undirected Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 11A.2 Directed but Cyclic Graphs . . . . . . . . . . . . . . . . . . . . . 13B Rudimentary Graph Theory 15We have spent a lot of time looking at ways of figuring out how one variable(or set of variables) depends on another variable (or set of variables) — thisis the core idea in regression and in conditional density estimation. We havealso looked at how to estimate the joint distribution of variables, both withkernel density estimation and with models like factor and mixture models. Thelater two show an example of how to get the joint distribution by combining aconditional distribution (observables given factors; mixture components) with amarginal distribution (Gaussian distribution of factors; the component weights).When dealing with complex sets of dependent variables, it would be nice tohave a general way of composing conditional distributions together to get jointdistributions, and especially nice if this gave us a way of reasoning about whatwe could ignore, of seeing which variables are irrelevant to which other variables.This is what graphical models let us do.1F1X1 X2 X3F2X4Figure 1: Illustration of a typical model with two latent factors (F1and F2, incircles) and four observables (X1through X4).1 Conditional Independence and Factor ModelsThe easiest way into this may be to start with the diagrams we drew for factoranalysis. There, we had observables and we had factors, and each observabledepended on, or loaded on, some of the factors. We drew a diagram where wehad nodes, standing for the variables, and arrows running from the factors to theobservables which depended on them. In the factor model, all the observableswere conditionally independent of each other, given all the factors:p(X1, X2, . . . Xp|F1, F2, . . . Fq) =pYi=1p(Xi|F1, . . . Fq) (1)But in fact observables are also independent of the factors they do not load on,so this is still too complicated. Let’s write loads(i) for the set of factors onwhich the observable Xiloads. Thenp(X1, X2, . . . Xp|F1, F2, . . . Fq) =pYi=1p(Xi|Floads(i)) (2)Consider Figure 1. The conditional distribution of observables given factorsisp(X1, X2, X3, X4|F1, F2) = p(X1|F1, F2)p(X2|F1, F2)p(X3|F1)p(X4|F 2) (3)X1loads on F1and F2, so it is independent of everything else, given thosetwo variables. X1is unconditionally dependent on X2, because they load on2common factors, F1and F2; and X1and X3are also dependent, because theyboth load on F1. In fact, X1and X2are still dependent given F1, becauseX2still gives information about F2. But X1and X3are independent givenF1, because they have no other factors in common. Finally, X3and X4areunconditionally independent because they have no factors in common. Butthey become dependent given X1, which provides information about both thecommon factors.None of these assertions rely on the detailed assumptions of the factor model,like Gaussian distributions for the factors, or linear dependence between factorsand observables. What they rely on is that Xiis independent of everything else,given the factors it loads on. The idea of graphical models is to generalize this,by focusing on relations of direct dependence, and the conditional independencerelations implied by them.32 Directed Acyclic Graph (DAG) ModelsWe have a collection of variables, which to be generic I’ll write X1, X2, . . . Xp.These may be discrete, continuous, or even vectors; it doesn’t matter. Werepresent these visually as nodes in a graph. There are arrows connecting someof these nodes. If an arrow runs from Xito Xj, then Xiis a parent of Xj.This is, as the name “parent” suggests, an anti-symmetric relationship, i.e., Xjcannot also be the parent of Xi. This is why we use an arrow, and why thegraph is directed1. We write the set of all parents of Xjas parents(j); thisgeneralizes the notion of the factors which an observable loads on to. The jointdistribution “decomposes according to the graph”:p(X1, X2, . . . Xp) =pYi=1p(Xi|Xparents(i)) (4)If Xihas no parents, because it has no incoming arrows, take p(Xi|Xparents(i))just to be the marginal distribution p(Xi). Such variables are called exogenous;the others, with parents, are endogenous. An unfortunate situation couldarise where X1is the parent of X2, which is the parent of X3, which is theparent of X1. Perhaps, under some circumstances, we could make sense of thisand actually calculate with Eq. 4, but the general practice is to rule it out byassuming the graph is acyclic, i.e., that it has no cycles, i.e., that we cannot,by following a series of arrows in the graph, go from one node to other nodesand ultimately back to our starting point. Altogether we say that we have adirected acyclic graph, or DAG, which represents the direct dependenciesbetween variables.2What good is this? The primary virtue is that if we are dealing with aDAG model, the graph tells us all the dependencies we need to know; thoseare the conditional distributions of variables on their parents, appearing in theproduct on the right hand side of Eq. 4. (This includes the distribution ofthe exogeneous variables.) This fact has two powerful sets of implications, forprobabilistic reasoning and for statistical inference.Let’s take inference first, because it’s more obvious: all that we have toestimate are the conditional distributions p(Xi|Xparents(i)). We do not haveto estimate the distribution of Xigiven all of the other variables, unless ofcourse they are all parents of Xi. Since estimating distributions, or even justregressions, conditional on many variables is hard, it is extremely helpful to beable to read off from the graph which variables we can ignore. Indeed,


View Full Document

CMU STA 36402-36608 - Lecture

Documents in this Course
Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?