11School of Computer ScienceUndirected Graphical Models Probabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)Lecture 2, Sep 17, 2007Eric XingEric XingReceptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Reading: MJ-Chap. 2,4, and KF-chap5Eric Xing 2z Directed edges give causality relationships (Bayesian Network or Directed Graphical Model):z Undirected edges simply give correlations between variables (Markov Random Field or Undirected Graphical model):Two types of GMsReceptor AKinaseCTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinaseCTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Receptor AKinaseCTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinaseCTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8P(X1, X2, X3, X4, X5, X6, X7, X8)= P(X1) P(X2) P(X3| X1) P(X4| X2) P(X5| X2)P(X6| X3, X4) P(X7| X6) P(X8| X5, X6)P(X1, X2, X3, X4, X5, X6, X7, X8)= 1/Z exp{E(X1)+E(X2)+E(X3, X1)+E(X4, X2)+E(X5, X2)+ E(X6, X3, X4)+E(X7, X6)+E(X8, X5, X6)}2Eric Xing 3Review: independence properties of DAGsz Defn: let Il(G) be the set of local independence properties encoded by DAG G, namely:{ Xi⊥ NonDescendants(Xi) | Parents(Xi) }z Defn: A DAG Gis an I-map (independence-map) of P if Il(G)⊆I(P)z A fully connected DAG Gis an I-map for any distribution, since Il(G)=∅⊆I(P) for any P.z Defn: A DAG Gis a minimal I-map for Pif it is an I-map for P, and if the removal of even a single edge from Grenders it not an I-map.z A distribution may have several minimal I-mapsz Each corresponding to a specific node-orderingEric Xing 4P-mapsz Defn: A DAG Gis a perfect map (P-map) for a distribution P if I(P)=I(G).z Thm: not every distribution has a perfect map as DAG.z Pf by counterexample. Suppose we have a model whereA⊥C | {B,D}, and B⊥D | {A,C}. This cannot be represented by any Bayes net.z e.g., BN1 wrongly says B⊥D | A, BN2 wrongly says B⊥D.ACCDDBBCAADDBBBN1BN1BN2BN2ACCDDBBMRFMRF3Eric Xing 5Undirected graphical modelsz Pairwise (non-causal) relationshipsz Can write down model, and score specific configurations of the graph, but no explicit way to generate samplesz Contingency constrains on node configurationsX1X4X2X3X5Eric Xing 6Canonical examplesz The grid modelz Naturally arises in image processing, lattice physics, etc.z Each node may represent a single "pixel", or an atomz The states of adjacent or nearby nodes are "coupled" due to pattern continuity or electro-magnetic force, etc.z Most likely joint-configurations usually correspond to a "low-energy" state4Eric Xing 7Social networksThe New Testament Social NetworkEric Xing 8Protein interaction networks5Eric Xing 9Modeling GoEric Xing 10Information retrieval topictopictexttextimageimage6Eric Xing 11Global Markov Independenciesz Let H be an undirected graph:z B separates A and C if every path from a node in A to a node in C passes through a node in B:z A probability distribution satisfies the global Markov propertyif for any disjoint A, B, C, such that B separates A and C, A is independent of C given B:);(sep BCAH{});(sep:)(I BCABCAHH⊥=Eric Xing 12Soundness of separation criterion z The independencies in I(H) are precisely those that are guaranteed to hold for every MRF distribution P over H. z In other words, the separation criterion is sound for detecting independence properties in MRF distributions over H.7Eric Xing 13Local Markov independencies z For each node Xi∈ V, there is unique Markov blanket of Xi, denoted MBXi, which is the set of neighbors of Xiin the graph (those that share an edge with Xi)z Defn (5.5.4): The local Markov independencies associated with H is:Iℓ(H): {Xi⊥ V –{Xi} – MBXi| MBXi: ∀ i),In other words, Xiis independent of the rest of the nodes in the graph given its immediate neighborsEric Xing 14Structure: an undirected graph• Meaning: a node is conditionally independent of every other node in the network given its Directed neighbors• Local contingency functions (potentials) and the cliques in the graph completely determine the joint dist. •Give correlations between variables, but no explicit way to generate samplesXY1Y2Summary: Conditional Independence Semantics in an MRF8Eric Xing 15Cliquesz For G={V,E}, a complete subgraph (clique) is a subgraphG'={V'⊆V,E'⊆E} such that nodes in V' are fully interconnectedz A (maximal) clique is a complete subgraph s.t. any superset V"⊃V' is not complete.z A sub-clique is a not-necessarily-maximal clique.z Example: z max-cliques = {A,B,D}, {B,C,D}, z sub-cliques = {A,B}, {C,D}, …Æ all edges and singletons ACCDDBBEric Xing 16Quantitative Specificationz Defn: an undirected graphical model represents a distribution P(X1 ,…,Xn) defined by an undirected graph H, and a set of positive potential functionsψcassociated with cliques of H, s.t.where Z is known as the partition function:z Also known as Markov Random Fields, Markov networks …z The potential function can be understood as an contingency function of its arguments assigning "pre-probabilistic" score of their joint configuration. ∏∈=CcccnZxxP )(),,( xψ11K∑∏∈=nxxCcccZ,,)(K1xψ(A Gibbs distribution)9Eric Xing 17z For discrete nodes, we can represent P'(X1:4) as two 3D tables instead of one 4D tableExample UGM – using max cliques ACCDDBB)()(),,,('23412443211xxccZxxxxPψψ×=∑×=4321234124xxxxccZ,,,)()( xxψψA,B,D B,C,D)(124xcψ)(234xcψEric Xing 18z We can represent P"(X1:4) as 5 2D tables instead of one 4D tablez Pair MRFs, a popular and simple special casez I(P') vs. I(P") ? D(P') vs. D(P") Example UGM – using subcliquesACCDDBB)()()()()()(),,,("34342424232314141212432111xxxxxxψψψψψψZZxxxxPijijij==∏∑∏=4321xxxxijijijZ,,,)(xψA,BA,DB,D C,DB,C10Eric Xing 19Interpretation of Clique Potentialsz The model implies X⊥Z|Y. This independence statement implies (by definition) that the joint must factorize as:z We can write this as: , butz cannot have all potentials be marginalsz cannot have all potentials be conditionalsz The positive clique potentials can only be thought of as general "compatibility", "goodness" or "happiness" functions over their variables, but not as probability distributions.)|()|()(),,(yzpyxpypzyxp=YXXZZ),()|(),,()|(),(),,(yzpyxpzyxpyzpyxpzyxp==Eric Xing 20Example UGM –
View Full Document