1BN Semantics 3 –Now it’s personal!Graphical Models – 10708Carlos GuestrinCarnegie Mellon UniversitySeptember 22nd, 2008Readings:K&F: 3.3, 3.410-708 – Carlos Guestrin 2006-200810-708 –Carlos Guestrin 2006-20082Independencies encoded in BN We said: All you need is the local Markov assumption (Xi⊥ NonDescendantsXi| PaXi) But then we talked about other (in)dependencies e.g., explaining away What are the independencies encoded by a BN? Only assumption is local Markov But many others can be derived using the algebra of conditional independencies!!!10-708 –Carlos Guestrin 2006-20083Understanding independencies in BNs– BNs with 3 nodesZYXLocal Markov Assumption:A variable X is independentof its non-descendants given its parents and only its parents Z YXZ YXZYXIndirect causal effect:Indirect evidential effect:Common cause:Common effect:10-708 –Carlos Guestrin 2006-20084Understanding independencies in BNs– Some examplesAHCEGDBFKJI10-708 –Carlos Guestrin 2006-20085Understanding independencies in BNs– Some more examplesAHCEGDBFKJI10-708 –Carlos Guestrin 2006-20086An active trail – ExampleA HCEGDBFF’’F’When are A and H independent?10-708 –Carlos Guestrin 2006-20087Active trails formalized A trail X1 –X2 –···–Xkis an active trail when variables O⊆{X1,…,Xn} are observed if for each consecutive triplet in the trail: Xi-1→Xi→Xi+1, and Xiis not observed (Xi∉O) Xi-1←Xi←Xi+1, and Xiis not observed (Xi∉O) Xi-1←Xi→Xi+1, and Xiis not observed (Xi∉O) Xi-1→Xi←Xi+1, and Xiis observed (Xi∈O), or one of its descendents 10-708 –Carlos Guestrin 2006-20088Active trails and independence? Theorem: Variables Xiand Xjare independent given Z⊆{X1,…,Xn} if the is no active trail between Xiand Xjwhen variables Z⊆{X1,…,Xn} are observedAHCEGDBFKJI10-708 –Carlos Guestrin 2006-20089More generally: Soundness of d-separation Given BN structure G Set of independence assertions obtained by d-separation: I(G) = {(X⊥Y|Z) : d-sepG(X;Y|Z)} Theorem: Soundness of d-separation If P factorizes over G then I(G) ⊆ I(P) Interpretation: d-separation only captures true independencies Proof discussed when we talk about undirected models10-708 –Carlos Guestrin 2006-200810Existence of dependency when not d-separated Theorem: If X and Y are not d-separated given Z, then X and Y are dependent given Z under some P that factorizes over G Proof sketch: Choose an active trail between X and Y given Z Make this trail dependent Make all else uniform (independent) to avoid “canceling” out influenceAHCEGDBFKJI10-708 –Carlos Guestrin 2006-200811More generally: Completeness of d-separation Theorem: Completeness of d-separation For “almost all” distributions where P factorizes over to G, we have that I(G) = I(P) “almost all” distributions: except for a set of measure zero of parameterizations of the CPTs (assuming no finite set of parameterizations has positive measure) Means that if all sets X & Y that are not d-separated given Z, then ¬(X⊥Y|Z) Proof sketch for very simple case:10-708 –Carlos Guestrin 2006-200812Interpretation of completeness Theorem: Completeness of d-separation For “almost all” distributions that P factorize over to G, we have that I(G) = I(P) BN graph is usually sufficient to capture all independence properties of the distribution!!!! But only for complete independence: P (X=x⊥Y=y | Z=z), ∀ x∈Val(X), y∈Val(Y), z∈Val(Z) Often we have context-specific independence (CSI) ∃ x∈Val(X), y∈Val(Y), z∈Val(Z): P (X=x⊥Y=y | Z=z) Many factors may affect your grade But if you are a frequentist, all other factors are irrelevant ☺10-708 –Carlos Guestrin 2006-200813Algorithm for d-separation How do I check if X and Y are d-separated given Z There can be exponentially-many trails between X and Y Two-pass linear time algorithm finds all d-separations for X 1. Upward pass Mark descendants of Z 2. Breadth-first traversal from X Stop traversal at a node if trail is “blocked” (Some tricky details apply – see reading)AHCEGDBFKJI10-708 –Carlos Guestrin 2006-200814What you need to know d-separation and independence sound procedure for finding independencies existence of distributions with these independencies (almost) all independencies can be read directly from graph without looking at CPTsAnnouncements Homework 1: Due next Wednesday – beginning of class! It’s hard – start early, ask questions Audit policy No sitting in, official auditors only, see course website10-708 –Carlos Guestrin 2006-200816Building BNs from independence properties From d-separation we learned: Start from local Markov assumptions, obtain all independence assumptions encoded by graph For most P’s that factorize over G, I(G) = I(P) All of this discussion was for a given G that is an I-map for P Now, give me a P, how can I get a G? i.e., give me the independence assumptions entailed by P Many G are “equivalent”, how do I represent this? Most of this discussion is not about practical algorithms, but useful concepts that will be used by practical algorithms Practical algs next time10-708 –Carlos Guestrin 2006-200817Minimal I-maps One option: G is an I-map for P G is as simple as possible G is a minimal I-map for P if deleting any edges from G makes it no longer an I-map10-708 –Carlos Guestrin 2006-200818Obtaining a minimal I-map Given a set of variables and conditional independence assumptions Choose an ordering on variables, e.g., X1, …, Xn For i = 1 to n Add Xito the network Define parents of Xi, PaXi, in graph as the minimal subset of {X1,…,Xi-1} such that local Markov assumption holds – Xiindependent of rest of {X1,…,Xi-1}, given parents PaXi Define/learn CPT – P(Xi| PaXi)Flu, Allergy, SinusInfection, Headache10-708 –Carlos Guestrin 2006-200819Minimal I-map not unique (or minimum) Given a set of variables and conditional independence assumptions Choose an ordering on variables, e.g., X1, …, Xn For i = 1 to n Add Xito the network Define parents of Xi, PaXi, in graph as the minimal subset of {X1,…,Xi-1} such that local Markov assumption holds – Xiindependent of rest of {X1,…,Xi-1}, given parents PaXi
View Full Document