This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CMSC828G Principles of Data Mining Lecture #13•Readings:– handouts•Today’s Lecture:– d-Separation, minimal I-Maps–Bayesian Networks– Markov Networks– Inference• Upcoming Due Dates:– P2 due 3/14Summary of Last ClassWe defined the following concepts• The Markov Independences of a DAG G–I (Xi, NonDesc(Xi) | Pai)• G is an I-Map of a distribution P– If P satisfies the Markov independencies implied by GWe showedG is an I-Map of P∏=⇔iiin1)Pa|X(P)X,...,X(Pd-seperation• A procedure d-sep(X; Y | Z, G)that given a DAG G, and sets X, Y,and Zreturns either yesor no• Goal:d-sep(X; Y | Z, G) = yes iffI(X;Y|Z)follows from Markov(G)•Xis d-separated from Y, given Z, if all paths from a node in Xto a node in Yare blocked, given Z.Paths• We want to know when a path is– active -- creates dependency between end nodes– blocked -- cannot create dependency end nodes• We want to classify situations in which paths are active.BlockedUnblockedER AERAPath Blockage– Common cause– Intermediate cause –Common EffectBlocked ActiveECAECAE BACE BACE BACMinimal I-MapsA DAG Gis a minimal I-Map of Pif•Gis an I-Map of P•If G’ ⊂ G, then G’is not an I-Map of PRemoving any arc from Gintroduces (conditional) independencies that do not hold in P2Constructing minimal I-MapsThe factorization theorem gives an algorithm•Fix an ordering X1,…,Xn•For each i, – select Paito be a minimal subset of {X1,…,Xi-1 },such that I(Xi; {X1,…,Xi-1 } - Pai| Pai)• Clearly, the resulting graph is a minimal I-Map.Non-uniqueness of minimal I-Map• Unfortunately, there may be several minimal I-Maps for the same distribution– Applying I-Map construction procedure with different orders can lead to different structuresEBACROriginal I-MapEBACROrder:C, R, A, E, BChoosing Ordering & Causality• The choice of order can have drastic impact on the complexity of minimal I-Map•Heuristic argument: construct I-Map using causal ordering among variables•Justification?– It is often reasonable to assume that graphs of causal influenceshould satisfy the Markov properties.P-Maps•A DAG Gis P-Map (perfect map) of a distribution Pif–I(X; Y | Z)if and only if d-sep(X; Y |Z, G) = yesNotes:• A P-Map captures all the independencies in the distribution• P-Maps are unique, up to DAG equivalenceP-Maps• Unfortunately, some distributions do not have a P-Map•Example:• A minimal I-Map: • This is not a P-Map since I(A;C)but d-sep(A;C) = no=⊕⊕=⊕⊕=1if610if121),,(CBACBACBAPA BCBayesian Networks• A Bayesian network specifies a probability distribution via two components:–A DAG G– A collection of conditional probability distributions P(Xi|Pai)• The joint distribution P is defined by the factorization• Additional requirement: Gis a minimal I-Map of P∏=iiinPaXPXXP)|(),...,(13Summary• We explored DAGs as a representation of conditional independencies:– Markov independencies of a DAG– Tight correspondence between Markov(G)and the factorization defined by G– d-separation, a sound & complete procedure for computing the consequences of the independencies– Notion of minimal I-Map–P-Maps • This theory is the basis for defining Bayesian networksMarkov Networks• We now briefly consider an alternative representation of conditional independencies •Let U be an undirected graph•Let Nibe the set of neighbors of Xi• Define Markov(U)to be the set of independenciesI( Xi; {X1,…,Xn} - Ni-{Xi} | Ni)•U is an I-Map ofPif P satisfies Markov(U)ExampleThis graph implies that•I(A; C | B, D )•I(B; D | A, C )• Note: this example does not have a directed P-Map ADBCMarkov Network FactorizationThm: if •Pis strictly positive, that is P(x1, …, xn)> 0for all assignments then•Uis an I-Map of Pif and only if• there is a factorizationwhere C1, …, Ckare the maximal cliques in UAlternative form:∏=iinCfXXP)(),,(1K∑=iiCgn1eZ1XXP)(),,( KRelationship between Directed & Undirected ModelsChain GraphsDirected GraphsUndirectedGraphsCPDs• So far, we focused on how to represent independencies using DAGs• The “other” component of a Bayesian networks is the specification of the conditional probability distributions (CPDs)• We start with the simplest representation of CPDs and then discuss additional structure4Tabular CPDs• When the variable of interest are all discrete, the common representation is as a table:•For example P(C|A,B)can be represented byA B P(C = 0 | A, B) P(C = 1 | A, B)0 0 0.25 0.750 1 0.50 0.501 0 0.12 0.881 1 0.33 0.67Tabular CPDsPros:• Very flexible, can capture any CPD of discrete variables• Can be easily stored and manipulatedCons:• Representation size grows exponentially with the number of parents!• Unwieldy to assess probabilities for more than few parentsStructured CPD • To avoid the exponential blowup in representation, we need to focus on specialized types of CPDs• This comes at a cost in terms of expressive power• We now consider several types of structured CPDsCausal Independence• Consider the following situation• In tabular CPD, we need to assess the probability of fever in eight cases• These involve all possible interactions between diseases• For three disease, this might be feasible….For ten diseases, not likely….Disease 1 Disease 3Disease 2FeverCausal Independence• Simplifying assumption:– Each disease attempts to cause fever, independently of the other diseases– The patient has fever if one of the diseases “succeeds”• We can model this using a Bayesian network fragmentFeverFever 1 Fever 3Fever 2Disease 1 Disease 3Disease 2OR gateF = or(SF,F1,F2,F3)Hypothetical variables“Fever caused by Disease i”SpontenuousFeverNoisy-Or CPD•Models P(X|Y1,…,Yk), X, Y1,…, Yk are all binary• Paremeters:–pi-- probability of X = 1due to Yi= 1–p0 -- probability of X = 1due to other causes• Plugging these in the model we get),...,|0(1),...,|1()1()1(),...,|0(1101kkiYikYYXPYYXPppYYXPi=−==−−==∏5Noisy-or CPD• Benefits of noisy-or– “Reasonable” assumptions in many domains• e.g., medical domain– Few parameters. – Each parameter can be estimated independently of the others• The same idea can be extended to other functions:noisy-max, noisy-and, etc.• Frequently used in large medical expert systemsContext Specific Independence• Consider the following examples:• Alarm sound depends on– Whether the alarm was set before leaving the house–Burglary–


View Full Document

UMD CMSC 828G - Principles of Data Mining

Documents in this Course
Lecture 2

Lecture 2

35 pages

Load more
Download Principles of Data Mining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Principles of Data Mining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Principles of Data Mining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?