CMU CS 10708 - Representation of directed GM - D1700589

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10708> Representation of directed GM

DOC PREVIEW

CMU CS 10708 - Representation of directed GM

School name Carnegie Mellon University

Course Cs 10708- Probabilistic Graphical Models

Pages 26

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

11School of Computer ScienceRepresentation of directed GMProbabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)Lecture 1, Sep 12, 2007Eric XingEric XingReceptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Reading: MJ-Chap 2, KF-Chap. 3Eric Xing 2z Recitation?z Exam dates, poster dates, etc. z Mailing listz Questions?2Eric Xing 3z Representation: what is the joint probability dist. on multiple variables?z How many state configurations in total? --- 28z Are they all needed to be represented?z Do we get any scientific/medical insight?zFactored representation: the chain-rulez This factorization is true for any distribution and any variable orderingz Do we save any parameterization cost?zIf Xi's are independent: (P(Xi|·)= P(Xi))),,,,,,,,( 87654321XXXXXXXXPRepresenting Multivariate DistributionACFGHEDBACFGHEDBACFGHEDBACFGHEDB),,,,,,|(),,,,,|( ),,,,|(),,,|(),,|(),|()|()(),,,,,,,( 76543218654321754321643215321421312187654321XXXXXXXXPXXXXXXXPXXXXXXPXXXXXPXXXXPXXXPXXPXPXXXXXXXXP=∏==iiXPXPXPXPXPXPXPXPXPXXXXXXXXP)()()()()()()()()(),,,,,,,( 8765432187654321zWhat do we gain?zWhat do we lose?Eric Xing 4z Even in the simplest case where these variables are binary-valued, a joint distribution requires the specification of 2nnumbers — the probabilities of the 2ndifferent assignments of values x1, . . . , xnzToday's lecture is about …z how independence properties in the distribution can be used to represent such high-dimensional distributions much more compactly.z how a combinatorial data structure — a directed acyclic graph — can provide us with a general-purpose modeling language for exploiting this type of structure in our representation.3Eric Xing 5z Directed edges give causality relationships (Bayesian Network or Directed Graphical Model):z Undirected edges simply give correlations between variables (Markov Random Field or Undirected Graphical model):Two types of GMsReceptor AKinaseCTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinaseCTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Receptor AKinaseCTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinaseCTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8P(X1, X2, X3, X4, X5, X6, X7, X8)= P(X1) P(X2) P(X3| X1) P(X4| X2) P(X5| X2)P(X6| X3, X4) P(X7| X6) P(X8| X5, X6)P(X1, X2, X3, X4, X5, X6, X7, X8)= 1/Z exp{E(X1)+E(X2)+E(X3, X1)+E(X4, X2)+E(X5, X2)+ E(X6, X3, X4)+E(X7, X6)+E(X8, X5, X6)}Eric Xing 6Specification of a directed GMz There are two components to any GM:z the qualitative specificationz the quantitative specificationACFGHEDBACFGHEDBACFGHEDB0.9 0.1cdc0.2 0.80.01 0.990.9 0.1dcddcDCP(F | C,D)0.9 0.1cdc0.2 0.80.01 0.990.9 0.1dcddcDCP(F | C,D)4Eric Xing 7Bayesian Network:z A BN is a directed graph whose nodes represent the random variables and whose edges represent direct influence of one variable on another.z It is a data structure that provides the skeleton for representing a joint distribution compactly in a factorized way;z It offers a compact representation for a set of conditional independence assumptions about a distribution;z We can view the graph as encoding a generative sampling processexecuted by nature, where the value for each variable is selected by nature using a distribution that depends only on its parents. In other words, each variable is a stochastic function of its parents.Eric Xing 8Bayesian Network: Factorization Theoremz Theorem: Given a DAG, The most general form of the probability distribution that is consistent with the graph factors according to “node given its parents”:where is the set of parents of Xi, d is the number of nodes (variables) in the graph.∏==diiiXPP:)|()(1πXXiπXP(X1, X2, X3, X4, X5, X6, X7, X8)= P(X1) P(X2) P(X3| X1) P(X4| X2) P(X5| X2)P(X6| X3, X4) P(X7| X6) P(X8| X5, X6)Receptor AKinase CTF FGene GGene HKinaseEKinaseDReceptor BX1X2X3X4X5X6X7X8Receptor AKinase CTF FGene GGene HKinaseEKinaseDReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X85Eric Xing 9Qualitative Specificationz Where does the qualitative specification come from?z Prior knowledge of causal relationshipsz Prior knowledge of modular relationshipsz Assessment from expertsz Learning from dataz We simply link a certain architecture (e.g. a layered graph) z …Eric Xing 10A CBACBABCLocal Structures & Independenciesz Common parentz Fixing B decouples A and C"given the level of gene B, the levels of A and C are independent"z Cascadez Knowing B decouples A and C"given the level of gene B, the level gene A provides no extra prediction value for the level of gene C"z V-structurez Knowing C couples A and Bbecause A can "explain away" B w.r.t. C"If A correlates to C, then chance for B to also correlate to B will decrease"z The language is compact, the concepts are rich!6Eric Xing 11A simple justificationABCEric Xing 12I-mapsz Defn (3.2.2): Let P be a distribution over X. We define I(P) to be the set of independence assertions of the form (X ⊥ Y | Z) that hold in P (however how we set the parameter-values).z Defn (3.2.3): Let K be any graph object associated with a set of independencies I(K). We say that K is an I-map for a set of independencies I, I(K) ⊆ I.z We now say that G is an I-map for P if G is an I-map for I(P), where we use I(G) as the set of independencies associated.7Eric Xing 13Facts about I-mapz For G to be an I-map of P, it is necessary that G does not mislead us regarding independencies in P: any independence that G asserts must also hold in P. Conversely, P may have additional dependencies that are not reflected in Gz Example:P1P2Eric Xing 14What is in I(G) ---local Markov assumptions of BNA Bayesian network structure G is a directed acyclic graph whose nodes represent random variables X1, . . . ,Xn. local Markov assumptionsz Defn (3.2.1): Let PaXidenote the parents of Xiin G, and NonDescendantsXidenote the variables in the graph that are not descendants of Xi. Then G encodes the following set of local conditional independence assumptions Iℓ(G):Iℓ(G): {Xi⊥ NonDescendantsXi| PaXi: ∀ i),In other words, each node Xiis independent of its nondescendants given its parents.8Eric Xing 15Examplex1x2x4x3Eric Xing 16Graph separation criterionz D-separation criterion for Bayesian networks (D for Directed edges):Defn: variables x and y are D-separated (conditionally independent) given z if

View Full Document