1 1 Junction Trees 3 Undirected Graphical Models Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University October 27th, 2008 Readings: K&F: 9.1, 9.2, 9.3, 9.4, 4.1, 4.2, 4.3, 4.4 10-708 – ©Carlos Guestrin 2006 10-708 – ©Carlos Guestrin 2006 2 Introducing message passing with division Variable elimination (message passing with multiplication) message: belief: Message passing with division: Belief: Belief about separator: message: C2: SE C4: GJS C1: CD C3: GDS2 10-708 – ©Carlos Guestrin 2006 3 Factor division Let X and Y be disjoint set of variables Consider two factors: φ1(X,Y) and φ2(Y) Factor ψ=φ1/φ2 0/0=0 10-708 – ©Carlos Guestrin 2006 4 Separator potentials µij one per edge (same both directions) holds “last message” initialized to 1 Message i!j what does i think the separator potential should be? σi!j update belief for j: pushing j to what i thinks about separator replace separator potential: C2: SE C4: GJS C1: CD C3: GDS Lauritzen-Spiegelhalter Algorithm (a.k.a. belief propagation) Simplified description see reading for details3 10-708 – ©Carlos Guestrin 2006 5 Convergence of Lauritzen-Spiegelhalter Algorithm Complexity: Linear in # cliques for the “right” schedule over edges (leaves to root, then root to leaves) Corollary: At convergence, every clique has correct belief C2 C4 C5 C1 C3 C7 C6 10-708 – ©Carlos Guestrin 2006 6 VE versus BP in clique trees VE messages (the one that multiplies) BP messages (the one that divides)4 10-708 – ©Carlos Guestrin 2006 7 Clique tree invariant Clique tree potential: Product of clique potentials divided by separators potentials Clique tree invariant: P(X) = πΤ (X) 10-708 – ©Carlos Guestrin 2006 8 Belief propagation and clique tree invariant Theorem: Invariant is maintained by BP algorithm! BP reparameterizes clique potentials and separator potentials At convergence, potentials and messages are marginal distributions5 10-708 – ©Carlos Guestrin 2006 9 Subtree correctness Informed message from i to j, if all messages into i (other than from j) are informed Recursive definition (leaves always send informed messages) Informed subtree: All incoming messages informed Theorem: Potential of connected informed subtree T’ is marginal over scope[T’] Corollary: At convergence, clique tree is calibrated πi = P(scope[πi]) µij = P(scope[µij]) 10-708 – ©Carlos Guestrin 2006 10 Clique trees versus VE Clique tree advantages Multi-query settings Incremental updates Pre-computation makes complexity explicit Clique tree disadvantages Space requirements – no factors are “deleted” Slower for single query Local structure in factors may be lost when they are multiplied together into initial clique potential6 10-708 – ©Carlos Guestrin 2006 11 Clique tree summary Solve marginal queries for all variables in only twice the cost of query for one variable Cliques correspond to maximal cliques in induced graph Two message passing approaches VE (the one that multiplies messages) BP (the one that divides by old message) Clique tree invariant Clique tree potential is always the same We are only reparameterizing clique potentials Constructing clique tree for a BN from elimination order from triangulated (chordal) graph Running time (only) exponential in size of largest clique Solve exactly problems with thousands (or millions, or more) of variables, and cliques with tens of nodes (or less) 10-708 – ©Carlos Guestrin 2006 12 Swinging Couples revisited This is no perfect map in BNs But, an undirected model will be a perfect map7 10-708 – ©Carlos Guestrin 2006 13 Potentials (or Factors) in Swinging Couples 10-708 – ©Carlos Guestrin 2006 14 Computing probabilities in Markov networks v. BNs In a BN, can compute prob. of an instantiation by multiplying CPTs In an Markov networks, can only compute ratio of probabilities directly8 10-708 – ©Carlos Guestrin 2006 15 Normalization for computing probabilities To compute actual probabilities, must compute normalization constant (also called partition function) Computing partition function is hard! ! Must sum over all possible assignments 10-708 – ©Carlos Guestrin 2006 16 Factorization in Markov networks Given an undirected graph H over variables X={X1,...,Xn} A distribution P factorizes over H if 9 subsets of variables D1⊆X,…, Dm⊆X, such that the Di are fully connected in H non-negative potentials (or factors) φ1(D1),…, φm(Dm) also known as clique potentials such that Also called Markov random field H, or Gibbs distribution over H9 10-708 – ©Carlos Guestrin 2006 17 Global Markov assumption in Markov networks A path X1 – … – Xk is active when set of variables Z are observed if none of Xi 2 {X1,…,Xk} are observed (are part of Z) Variables X are separated from Y given Z in graph H, sepH(X;Y|Z), if there is no active path between any X2X and any Y2Y given Z The global Markov assumption for a Markov network H
View Full Document