ProbabilisticGraphical ModelsLecture 3 – Bayesian Networks SemanticsCS/CNS/EE 155Andreas Krause2Bayesian networksCompact representation of distributions over large number of variables(Often) allows efficient exact inference (computing marginals, etc.)HailFinder56 vars~ 3 states each~1026terms> 10.000 yearson Top supercomputersJavaBayes applet3Causal parametrizationGraph with directed edges from (immediate) causes to (immediate) effects Earthquake BurglaryAlarmJohnCallsMaryCalls4Bayesian networksA Bayesian network structure is a directed, acyclic graph G, where each vertex s of G is interpreted as a random variable Xs (with unspecified distribution)A Bayesian network (G,P) consists of A BN structure G and ....a set of conditional probability distributions (CPDs)P(Xs| PaXs), where PaXsare the parents of node Xssuch that(G,P) defines joint distribution5Representing the world using BNsWant to make sure that I(P) ⊆ I(P’)Need to understand CI properties of BN (G,P) s1s2s3s4s5s7s6s11s12s9s10s8s1s3s12s9True distribution P’with cond. ind. I(P’)Bayes net (G,P)with I(P)represent6Local Markov AssumptionEach BN Structure G is associated with the following conditional independence assumptionsX ⊥ NonDescendentsX| PaXWe write Iloc(G) for these conditional independencesSuppose (G,P) is a Bayesian network representing PDoes it hold that Iloc(G) ⊆ I(P)?If this holds, we say G is an I-map for P.7Factorization Theorems1s2s3s4s5s7s6s11s12s9s10s8s1s3s12s9Iloc(G) ⊆ I(P)True distribution Pcan be represented exactly asBayesian network (G,P)G is an I-map of P(independence map)8Additional conditional independenciesBN specifies joint distribution through conditional parameterization that satisfies Local Markov PropertyIloc(G) = {(Xi⊥ NondescendantsXi| PaXi)}But we also talked about additional properties of CIWeak Union, Intersection, Contraction, …Which additional CI does a particular BN specify?All CI that can be derived through algebraic operations proving CI is very cumbersome!!Is there an easy way to find all independences of a BN just by looking at its graph??9BNs with 3 nodesX Y ZX Y ZXYZXYZLocal Markov Property:X ⊥ NonDesc(X) | Pa(X)10V-structuresKnow E ⊥ BSuppose we know A. Does E ⊥ B | A hold?Earthquake BurglaryAlarm11BNs with 3 nodesX Y ZX Y ZXYZXYZLocal Markov Property:X ⊥ NonDesc(X) | Pa(X)Indirect causal effectIndirect evidential effectCommon causeCommon effectX ⊥ Z | Y¬ (X ⊥ Z)X ⊥ Z¬ (X ⊥ Z | Y)12ExamplesABCDEFGIHJ13More examplesABCDEFGIHJ14Active trailsWhen are A and I independent?ABCDGHEFI15Active trailsAn undirected path in BN structure G is called active trail for observed variables O ⊆ {X1,…,Xn}, if for every consecutive triple of vars X,Y,Z on the pathX Y Z and Y is unobserved (Y ∉O)X Y Z and Y is unobserved (Y ∉O)X Y Z and Y is unobserved (Y ∉O)X Y Z and Y or any of Y’s descendants is observedAny variables Xiand Xjfor which ∄ active trail for observations O are called d-separated by OWe write d-sep(Xi;Xj| O)Sets A and B are d-separated given O if d-sep(X,Y |O) for all X∈A, Y∈B. Write d-sep(A; B | O)16d-separation and independenceProof uses algebraic properties of conditional independenceABCDEFGIHITheorem:d-sep(X;Y | Z) X ⊥ Y | Z i.e., X cond. ind. Y given Zif there does not existany active trailbetween X and Yfor observations Z17Soundness of d-separationHave seen: P factorizes according to G Iloc(G)⊆ I(P)Define I(G) = {(X ⊥ Y | Z): d-sepG(X;Y |Z)}Theorem: Soundness of d-separationP factorizes over G I(G) ⊆ I(P)Hence, d-separation captures only true independencesHow about I(G) = I(P)?18Does the converse hold?Suppose P factorizes over G.Does it hold that I(P) ⊆ I(G)?19Existence of dependences for non-d-separated variablesTheorem: If X and Y are not d-separated given Z, then there exists some distribution P factorizing over G in which X and Y are dependent given ZProof sketch:20Completeness of d-separationTheorem: For “almost all” distributions P that factorize over G it holds that I(G) = I(P)“almost all”: except for a set of distributions with measure 0, assuming only that no finite set of distributions has measure > 021Algorithm for d-separationHow can we check if X ⊥ Y | Z?Idea: Check every possible path connecting X and Y and verify conditionsExponentially many paths!!! Linear time algorithm:Find all nodes reachable from X1. Mark Z and its ancestors2. Do breadth-first search startingfrom X; stop if path is blockedHave to be careful with implementation details (see reading)ABCDEFGIHI22Representing the world using BNsWant to make sure that I(P) ⊆ I(P’)Ideally: I(P) = I(P’)Want BN that exactly captures independencies in P’!s1s2s3s4s5s7s6s11s12s9s10s8s1s3s12s9True distribution P’with cond. ind. I(P’)Bayes net (G,P)with I(P)represent23Minimal I-mapsLemma: Suppose G’ is derived from G by adding edgesThen I(G’) ⊆ I(G) Proof:Thus, want to find graph G with I(G) ⊆ I(P) such that when we remove any single edge, for the resulting graph G’ it holds that I(G’) I(P)Such a graph G is called minimal I-map24Existence of Minimal I-MapsDoes every distribution have a minimal I-Map?25Algorithm for finding minimal I-mapGiven random variables and known conditional independencesPick ordering X1,…,Xnof the variablesFor each XiFind minimal subset A ⊆{X1,…,Xi-1} such that P(Xi| X1,…,Xi-1) = P(Xi| A)Specify / learn CPD P(Xi| A)Will produce minimal I-map!26Uniqueness of Minimal I-mapsIs the minimal I-Map unique?E BAJ ME BAJ MJ MAE B27Perfect mapsMinimal I-maps are easy to find, but can contain many unnecessary dependencies.A BN structure G is called P-map (perfect map) for distribution P if I(G) = I(P)Does every distribution P have a P-map?28Existence of perfect maps29Existence of perfect maps30Uniqueness of perfect maps31I-EquivalenceTwo graphs G, G’ are called I-equivalent if I(G) = I(G’)I-equivalence partitions graphs into equivalence classes32Skeletons of BNsI-equivalent BNs must have same skeletonABCDEFGIHJABCDEFGIHJ33Importance of V-structuresTheorem: If G, G’ have same skeleton and same V-structure, then I(G) = I(G’)Does the converse hold?34Immoralities and I-equivalenceA V-structure X Y Z is called immoral if there is no edge between X and Z (“unmarried parents”)Theorem: I(G) = I(G’) G and G’ have the same skeleton and the same immoralities.35TasksSubscribe to Mailing
View Full Document