This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC828G Principles of Data Mining Lecture #12•Readings:–handouts• Today’s Lecture:– d-Separation, minimal I-Maps– Bayesian Networks– Markov Networks• Upcoming Due Dates:–H2 due today– P2 due 3/14Summary of Last ClassWe defined the following concepts• The Markov Independences of a DAG G–I (Xi, NonDesc(Xi) | Pai)• G is an I-Map of a distribution P– If P satisfies the Markov independencies implied by GWe proved the factorization theorem• if G is an I-Map of P, then∏=iiin1)Pa|X(P)X,...,X(P• slides courtesy of Nir Friedman, see references•Let Markov(G)be the set of Markov Independencies implied by G• The factorization theorem showsG is an I-Map of P ⇒• We can also show the opposite:Thm:⇒Gis an I-Map of PConditional Independencies∏=iiinPaXPXXP)|(),...,(1∏=iiinPaXPXXP)|(),...,(1Proof (Outline)Example:XYZ)|()()|()|()(),(),,(),|(XYPXPXZPXYPXPYXPZYXPYXZP==)|(XZP=Implied Independencies• Does a graph Gimply additional independencies as a consequence of Markov(G)?• We can define a logic of independence statements•Some axioms:–I( X ; Y | Z ) ⇒ I( Y; X | Z )–I( X ; Y1, Y2| Z ) ⇒ I( X; Y1| Z )d-seperation• A procedure d-sep(X; Y | Z, G)that given a DAG G, and sets X, Y,and Zreturns either yesor no• Goal:d-sep(X; Y | Z, G) = yes iffI(X;Y|Z)follows from Markov(G)Paths• Intuition: dependency must “flow” along paths in the graph• A path is a sequence of neighboring variablesExamples:•R ← E → A ← B•C ← A ← E → REarthquakeRadioBurglaryAlarmCallPaths• We want to know when a path is– active -- creates dependency between end nodes– blocked -- cannot create dependency end nodes• We want to classify situations in which paths are active.BlockedUnblockedER AER APath BlockageThree cases:– Common cause––BlockedActiveBlockedUnblockedECAECAPath BlockageThree cases:– Common cause– Intermediate cause–BlockedActiveBlockedUnblockedE BACE BACE BACPath BlockageThree cases:– Common cause– Intermediate cause– Common EffectBlockedActivePath Blockage -- General CaseA path is active, given evidence Z, if• Whenever we have the configurationBor one of its descendents are in Z• No other nodes in the path are in ZA path is blocked, given evidence Z, if it is not active.A CBA–d-sep(R,B)?ExampleE BCR–d-sep(R,B) = yes–d-sep(R,B|A)?ExampleE BACR–d-sep(R,B) = yes–d-sep(R,B|A) = no–d-sep(R,B|E,A)?ExampleE BACRd-Separation•Xis d-separated from Y, given Z, if all paths from a node in Xto a node in Yare blocked, given Z.• Checking d-separation can be done efficiently (linear time in number of edges)–Bottom-up phase: Mark all nodes whose descendents are in Z–Xto Yphase:Traverse (BFS) all edges on paths from Xto Yand check if they are blockedSoundnessThm: •If –G is an I-Map of P–d-sep( X; Y | Z, G ) = yes•then–Psatisfies I( X; Y | Z )Informally,• Any independence reported by d-separation is satisfied by underlying distributionCompletenessThm: •If d-sep( X; Y | Z, G ) = no• then there is a distribution Psuch that–Gis an I-Map of P–Pdoes not satisfy I( X; Y | Z )Informally,• Any independence not reported by d-separation might be violated by the underlying distribution• We cannot determine this by examining the graph structure aloneI-Maps revisited• The fact that Gis I-Map of Pmight not be that useful• For example, complete DAGs–A DAG is Gis complete is we cannot add an arc without creating a cycle• These DAGs do not imply any independencies• Thus, they are I-Maps of any distributionX1X3X2X4X1X3X2X4Minimal I-MapsA DAG Gis a minimal I-Map of Pif•Gis an I-Map of P•If G’ ⊂ G, then G’is not an I-Map of PRemoving any arc from Gintroduces (conditional) independencies that do not hold in PMinimal I-Map Example• If is a minimal I-Map• Then, these are not I-Maps:X1X3X2X4X1X3X2X4X1X3X2X4X1X3X2X4X1X3X2X4Constructing minimal I-MapsThe factorization theorem suggests an algorithm• Fix an ordering X1,…,Xn•For each i, – select Paito be a minimal subset of {X1,…,Xi-1 },such that I(Xi; {X1,…,Xi-1 } - Pai| Pai)• Clearly, the resulting graph is a minimal I-Map.Non-uniqueness of minimal I-Map• Unfortunately, there may be several minimal I-Maps for the same distribution– Applying I-Map construction procedure with different orders can lead to different structuresEBACROriginal I-MapEBACROrder:C, R, A, E, BChoosing Ordering & Causality• The choice of order can have drastic impact on the complexity of minimal I-Map• Heuristic argument: construct I-Map using causal ordering among variables• Justification?– It is often reasonable to assume that graphs of causal influenceshould satisfy the Markov properties.– We will revisit this issue in future classesP-Maps• A DAG Gis P-Map (perfect map) of a distribution Pif–I(X; Y | Z)if and only if d-sep(X; Y |Z, G) = yesNotes:• A P-Map captures all the independencies in the distribution• P-Maps are unique, up to DAG equivalenceP-Maps• Unfortunately, some distributions do not have a P-Map•Example:• A minimal I-Map: • This is not a P-Map since I(A;C)but d-sep(A;C) = no=⊕⊕=⊕⊕=1if610if121),,(CBACBACBAPA BCBayesian Networks• A Bayesian network specifies a probability distribution via two components:–A DAG G– A collection of conditional probability distributions P(Xi|Pai)• The joint distribution P is defined by the factorization• Additional requirement: Gis a minimal I-Map of P∏=iiinPaXPXXP)|(),...,(1Summary• We explored DAGs as a representation of conditional independencies:– Markov independencies of a DAG– Tight correspondence between Markov(G)and the factorization defined by G– d-separation, a sound & complete procedure for computing the consequences of the independencies– Notion of minimal I-Map–P-Maps • This theory is the basis for defining Bayesian networksReferences• Principles of Data Mining, Hand, Mannila, Smyth. MIT Press, 2001.• Nir Friedman’s excellent lecture notes,


View Full Document

UMD CMSC 828G - Principles of Data Mining

Documents in this Course
Lecture 2

Lecture 2

35 pages

Load more
Download Principles of Data Mining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Principles of Data Mining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Principles of Data Mining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?