1 1 BN Semantics 3 – Now it’s personal! Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 22nd, 2008 Readings: K&F: 3.3, 3.4 10-708 – ©Carlos Guestrin 2006-2008 10-708 – ©Carlos Guestrin 2006-2008 2 Independencies encoded in BN We said: All you need is the local Markov assumption (Xi ⊥ NonDescendantsXi | PaXi) But then we talked about other (in)dependencies e.g., explaining away What are the independencies encoded by a BN? Only assumption is local Markov But many others can be derived using the algebra of conditional independencies!!!2 10-708 – ©Carlos Guestrin 2006-2008 3 Understanding independencies in BNs – BNs with 3 nodes Z Y X Local Markov Assumption: A variable X is independent of its non-descendants given its parents and only its parents Z Y X Z Y X Z Y X Indirect causal effect: Indirect evidential effect: Common cause: Common effect: 10-708 – ©Carlos Guestrin 2006-2008 4 Understanding independencies in BNs – Some examples A H C E G D B F K J I3 10-708 – ©Carlos Guestrin 2006-2008 5 Understanding independencies in BNs – Some more examples A H C E G D B F K J I 10-708 – ©Carlos Guestrin 2006-2008 6 An active trail – Example A H C E G D B F F’’ F’ When are A and H independent?4 10-708 – ©Carlos Guestrin 2006-2008 7 Active trails formalized A trail X1 – X2 – · · · –Xk is an active trail when variables O⊆{X1,…,Xn} are observed if for each consecutive triplet in the trail: Xi-1→Xi→Xi+1, and Xi is not observed (Xi∉O) Xi-1←Xi←Xi+1, and Xi is not observed (Xi∉O) Xi-1←Xi→Xi+1, and Xi is not observed (Xi∉O) Xi-1→Xi←Xi+1, and Xi is observed (Xi2O), or one of its descendents 10-708 – ©Carlos Guestrin 2006-2008 8 Active trails and independence? Theorem: Variables Xi and Xj are independent given Z⊆{X1,…,Xn} if the is no active trail between Xi and Xj when variables Z⊆{X1,…,Xn} are observed A H C E G D B F K J I5 10-708 – ©Carlos Guestrin 2006-2008 9 More generally: Soundness of d-separation Given BN structure G Set of independence assertions obtained by d-separation: I(G) = {(X⊥Y|Z) : d-sepG(X;Y|Z)} Theorem: Soundness of d-separation If P factorizes over G then I(G) ⊆ I(P) Interpretation: d-separation only captures true independencies Proof discussed when we talk about undirected models 10-708 – ©Carlos Guestrin 2006-2008 10 Existence of dependency when not d-separated Theorem: If X and Y are not d-separated given Z, then X and Y are dependent given Z under some P that factorizes over G Proof sketch: Choose an active trail between X and Y given Z Make this trail dependent Make all else uniform (independent) to avoid “canceling” out influence A H C E G D B F K J I6 10-708 – ©Carlos Guestrin 2006-2008 11 More generally: Completeness of d-separation Theorem: Completeness of d-separation For “almost all” distributions where P factorizes over to G, we have that I(G) = I(P) “almost all” distributions: except for a set of measure zero of parameterizations of the CPTs (assuming no finite set of parameterizations has positive measure) Means that if all sets X & Y that are not d-separated given Z, then ¬ (X⊥Y|Z) Proof sketch for very simple case: 10-708 – ©Carlos Guestrin 2006-2008 12 Interpretation of completeness Theorem: Completeness of d-separation For “almost all” distributions that P factorize over to G, we have that I(G) = I(P) BN graph is usually sufficient to capture all independence properties of the distribution!!!! But only for complete independence: P (X=x⊥Y=y | Z=z), 8 x2Val(X), y2Val(Y), z2Val(Z) Often we have context-specific independence (CSI) 9 x2Val(X), y2Val(Y), z2Val(Z): P (X=x⊥Y=y | Z=z) Many factors may affect your grade But if you are a frequentist, all other factors are irrelevant 7 10-708 – ©Carlos Guestrin 2006-2008 13 Algorithm for d-separation How do I check if X and Y are d-separated given Z There can be exponentially-many trails between X and Y Two-pass linear time algorithm finds all d-separations for X 1. Upward pass Mark descendants of Z 2. Breadth-first traversal from X Stop traversal at a node if trail is “blocked” (Some tricky details apply – see reading) A H C E G D B F K J I 10-708 – ©Carlos Guestrin 2006-2008 14 What you need to know d-separation and independence sound procedure for finding independencies existence of distributions with these independencies (almost) all independencies can be read directly from graph without looking at CPTs8 Announcements Homework 1: Due next Wednesday – beginning of class! It’s hard – start early, ask questions Audit policy No sitting in, official auditors only, see course website 10-708 – ©Carlos Guestrin 2006-2008 16 Building BNs from independence properties From d-separation we learned: Start from local Markov assumptions, obtain all independence assumptions encoded by graph For most P’s that factorize over G, I(G) = I(P) All of this discussion was for a given G that is an I-map for P Now, give me a P, how can I get a G? i.e., give me the independence assumptions entailed by P Many G are “equivalent”, how do I represent this? Most of this discussion is not about practical algorithms, but useful concepts that will be used by practical algorithms Practical algs next time9 10-708 – ©Carlos Guestrin 2006-2008 17 Minimal I-maps One option: G is an I-map for P G is as simple as possible G is a minimal I-map for P if deleting any edges from G makes it no longer an I-map 10-708 – ©Carlos Guestrin 2006-2008 18 Obtaining a minimal I-map Given a set of variables and conditional independence assumptions Choose an ordering on variables, e.g., X1, …, Xn For i = 1 to n Add Xi to the network Define parents of Xi, PaXi, in graph as the minimal subset of {X1,…,Xi-1} such that local Markov assumption holds – Xi independent of rest of {X1,…,Xi-1}, given parents PaXi Define/learn CPT – P(Xi| PaXi)
View Full Document