11School of Computer ScienceReceptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Junction Tree Algorithmand a case study of theHidden Markov Model Probabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)Lecture 6, Oct 3, 2007Eric XingEric XingReading: J-Chap 12, 17, KF-Chap. 10Eric Xing 2Outlinez So far we have studied exact inference in:z Treesz message passing on the original graph (which are trees)z Poly-trees, Tree-like graphsz message passing in factor treesz Now we will look into exact inference in arbitrary graphsz Junction-Tree algorithmz Inference in Hidden Markov Model2Eric Xing 3The Junction Tree Algorithmz Recall: Elimination ≡ message passing on a clique treez Junction Tree Algorithm:z computing messages on a clique treez message passing protocol on a clique treez There are several inference algorithms; some of which operate directly on (special) directed graphz Forward-backward algorithm for HMM (we will see it later)z Pealing algorithm for trees and phylogeniesz The junction tree algorithm is the most popular and general inference algorithm, it operates on an undirected graphz To understand the JT-algorithm, we need to understand how to compile a directed graph into an undirected graphEric Xing 4Moral Graphz Note that for both directed GMs and undirected GMs, the joint probability is in a product form:z So let’s convert local conditional probabilities into potentials; then the second expression will be generic, but how does this operation affect the directed graph?z We can think of a conditional probability, e.g,. P(C|A,B) as a function of the three variables A, B, and C (we get a real number of each configuration):z Problem: But a node and its parent are not generally in the same clique in a BNz Solution: Marry the parents to obtain the "moral graph" ∏==diiiXPP:)|()(1πXX∏∈=CcccZP )()( XXψ1BN: MRF:ABCP(C|A,B) ABCΨ(A,B,C) = P(C|A,B)3Eric Xing 5Moral Graph (cont.)z Define the potential on a clique as the product over all conditional probabilities contained within the cliquez Now the product of potentials gives the right answer:),,(),,(),,(),|()|()|(),|()()(),,,,,( 654543321546353421321654321XXXXXXXXXXXXPXXPXXPXXXPXPXPXXXXXXPψψψ==),|()()(),,(21321321XXXPXPXPXXX=ψ)|()|(),,(3534543XXPXXPXXX=ψ),|(),,(546654XXXPXXX=ψwhereX1X2X3X4X5X6X1X2X3X5X4X6Note that here the interpretation of potential is ambivalent: it can be either marginalsor conditionalsEric Xing 6Clique treesz A clique tree is an (undirected) tree of cliquesz Consider cases in which two neighboring cliques V and W have an overlap S (e.g., (X1, X2, X3) overlaps with (X3, X4, X5) ),z Now we have an alternative representation of the joint in terms of the potentials:X1X2X3X5X4X6X3, X4,X5X4, X5,X6X1,X2, X3V WS)(Wψ)(Vψ)(SφX3, X4,X5X4, X5,X6X1,X2, X3X3X44Eric Xing 7Clique treesz A clique tree is an (undirected) tree of cliquesz The alternative representation of the joint in terms of the potentials:z Generally:X1X2X3X5X4X6X3, X4,X5X4, X5,X6X1,X2, X3X3X4),(),,()(),,(),,(),(),,()(),,(),,(),|()|()|(),|()()(),,,,,( 546543543321546543543321546353421321654321XXXXXXXXXXXXXXPXXXPXPXXXPXXXPXXXPXXPXXPXXXPXPXPXXXXXXPφψφψψ===∏∏=SSSCCCP)()()(XXXφψNow each potential is isomorphic to the cluster marginal of the attendant set of variablesEric Xing 8Why this is useful?z Propagation of probabilitiesz Now suppose that some evidence has been "absorbed" (i.e., certain values of some nodes have been observed). How do we propagate this effect to the rest of the graph?z What do we mean by propagate?Can we adjust all the potentials {ψ}, {φ} so that they still represent the correct cluster marginals (or unnormalized equivalents) of their respective attendant variables?z Utility? X1X2X3X4X5X6X1X2X3X5X4X6X3, X4,X5X4, X5,X6X1,X2, X3X3X4),,()|(,32166132XXXxXXPXX∑==ψ)()|(3663XxXXPφ==Local operations!),,()(,654654xXXxPXX∑=ψ5Eric Xing 9Local Consistencyz We have two ways of obtaining p(S)and they must be the samez The following update-rule ensures this:z Forward update:z Backward updatez Two important identities can be provenV WS)(Wψ)(Vψ)(Sφ)()(\VSPSV∑=ψ)()(\WSPSW∑=ψ∑=SVVS\*ψφWSSWψφφψ**=∑=SWWS\***ψφ******VSSVψφφψ=**\*\**SSWWSVVφψψ==∑∑SWVSWVSWVφψψφψψφψψ==*********Local Consistency Invariant JointEric Xing 10Message Passing Algorithmz This simple local message-passing algorithm on a clique tree defines the general probability propagation algorithm for directed graphs!z Many interesting algorithms are special cases:z Forward-backward algorithm for hidden Markov models,z Kalman filter updatesz Pealing algorithms for probabilistic treesz The algorithm seems reasonable. Is it correct?V WS)(Wψ)(Vψ)(Sφ∑=SVVS\*ψφWSSWψφφψ**=∑=SWWS\***ψφ******VSSVψφφψ=6Eric Xing 11A problemz Consider the following graph and a corresponding clique treez Note that C appears in two non-neighboring cliquesz Question: with the previous message passage, can we ensure that the probability associated with C in these two (non-neighboring) cliques consistent?z Answer: No. It is not true that in general local consistency implies global consistencyz What else do we need to get such a guarantee?A BC DA,B B,DA,C C,DEric Xing 12Triangulationz A triangulated graph is one in which no cycles with four or more nodes exist in which there is no chordz We triangulate a graph by adding chords:z Now we no longer have our global inconsistency problem.z A clique tree for a triangulated graph has the running intersection property: If a node appears in two cliques, it appears everywhere on the path between the cliquesz Thus local consistency implies global consistencyA BC DA BC DA,B,CB,C,D7Eric Xing 13Junction treesz A clique tree for a triangulated graph is referred to as a junction treez In junction trees, local consistency implies global consistency. Thus the local message-passing algorithms is (provably) correctz It is also possible to show that only triangulated graphs have the property that their clique trees are junctions. Thus if we want local algorithms, we must triangulatez Are we now all set?z How to triangulate?z The complexity of building a JT depends on how we triangulate!!z Consider this network:it turns out that we will need to pay an O(24) or O(26) cost depending on how we triangulate!BADCEFG HEric Xing 14moralizationB ADCE FG HB ADCE FG HB ADCB ADCE
View Full Document