Bayesian Networks –Representation Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityMarch 20th, 2006Announcements Welcome back! One page project proposal due Wednesday We’ll go over midterm in this week’s recitationHandwriting recognitionCharacter recognition, e.g., kernel SVMszcbcacrrrrrrWebpage classificationCompany home pagevsPersonal home pagevsUniveristy home pagevs…Handwriting recognition 2Webpage classification 2Today – Bayesian networks One of the most exciting advancements in statistical AI in the last 10-15 years Generalizes naïve Bayes and logistic regression classifiers Compact representation for exponentially-large probability distributions Exploit conditional independenciesCausal structure Suppose we know the following: The flu causes sinus inflammation Allergies cause sinus inflammation Sinus inflammation causes a runny nose Sinus inflammation causes headaches How are these connected?Possible queriesFluAllergySinusHeadacheNose Inference Most probable explanation Active data collectionCar starts BN 18 binary attributes Inference P(BatteryAge|Starts=f) 218terms, why so fast? Not impressed? HailFinder BN – more than 354= 58149737003040059690390169 termsFactored joint distribution -PreviewFluAllergySinusHeadacheNoseNumber of parametersFluAllergySinusHeadacheNoseKey: Independence assumptionsFluAllergySinusHeadacheNoseKnowing sinus separates the variables from each other(Marginal) Independence Flu and Allergy are (marginally) independent More Generally:Allergy = fAllergy = tFlu = fFlu = tAllergy = fAllergy = tFlu = fFlu = tMarginally independent random variables Sets of variables X, Y X is independent of Y if P (X=x|Y=y), ∀ x∈∈∈∈Val(X), y∈∈∈∈Val(Y) Shorthand: Marginal independence: P (X ⊥ Y) Proposition: P statisfies (X ⊥ Y) if and only if P(X,Y) = P(X) P(Y)Conditional independence Flu and Headache are not (marginally) independent Flu and Headache are independent given Sinus infection More Generally:Conditionally independent random variables Sets of variables X, Y, Z X is independent of Y given Z if P (X=x,Y=y|Z=z), ∀ x∈∈∈∈Val(X), y∈∈∈∈Val(Y), z∈∈∈∈Val(Z) Shorthand: Conditional independence: P (X ⊥ Y | Z) For P (X ⊥ Y | ∅), write P (X ⊥ Y) Proposition: P statisfies (X ⊥ Y | Z) if and only if P(X,Y|Z) = P(X|Z) P(Y|Z)Properties of independence Symmetry: (X ⊥ Y | Z) ⇒ (Y ⊥ X | Z) Decomposition: (X ⊥ Y,W | Z) ⇒ (X ⊥ Y | Z) Weak union: (X ⊥ Y,W | Z) ⇒ (X ⊥ Y | Z,W) Contraction: (X ⊥ W | Y,Z) & (X ⊥ Y | Z) ⇒ (X ⊥ Y,W | Z) Intersection: (X ⊥ Y | W,Z) & (X ⊥ W | Y,Z) ⇒ (X ⊥ Y,W | Z) Only for positive distributions! P(α)>0, ∀α, α≠∅The independence assumption FluAllergySinusHeadacheNoseLocal Markov Assumption:A variable X is independentof its non-descendants given its parentsExplaining awayFluAllergySinusHeadacheNoseLocal Markov Assumption:A variable X is independentof its non-descendants given its parentsNaïve Bayes revisitedLocal Markov Assumption:A variable X is independentof its non-descendants given its parentsWhat about probabilities?Conditional probability tables (CPTs)FluAllergySinusHeadacheNoseJoint distributionFluAllergySinusHeadacheNoseWhy can we decompose? Markov Assumption!The chain rule of probabilities P(A,B) = P(A)P(B|A) More generally: P(X1,…,Xn) = P(X1) · P(X2|X1) · … · P(Xn|X1,…,Xn-1)FluSinusChain rule & Joint distributionFluAllergySinusHeadacheNoseLocal Markov Assumption:A variable X is independentof its non-descendants given its parentsTwo (trivial) special casesEdgeless graph Fully-connected graphThe Representation Theorem –Joint Distribution to BNJoint probabilitydistribution:ObtainBN:Encodes independenceassumptionsIf conditionalindependenciesin BN are subset of conditional independencies in PReal Bayesian networks applications Diagnosis of lymph node disease Speech recognition Microsoft office and Windows http://www.research.microsoft.com/research/dtg/ Study Human genome Robot mapping Robots to identify meteorites to study Modeling fMRI data Anomaly detection Fault dianosis Modeling sensor network dataA general Bayes net Set of random variables Directed acyclic graph Encodes independence assumptions CPTs Joint distribution:How many parameters in a BN? Discrete variables X1, …, Xn Graph Defines parents of Xi, PaXi CPTs – P(Xi| PaXi)Another example Variables: B – Burglar E – Earthquake A – Burglar alarm N – Neighbor calls R – Radio report Both burglars and earthquakes can set off the alarm If the alarm sounds, a neighbor may call An earthquake may be announced on the radioAnother example – Building the BN B – Burglar E – Earthquake A – Burglar alarm N – Neighbor calls R – Radio reportIndependencies encoded in BN We said: All you need is the local Markov assumption (Xi⊥ NonDescendantsXi| PaXi) But then we talked about other (in)dependencies e.g., explaining away What are the independencies encoded by a BN? Only assumption is local Markov But many others can be derived using the algebra of conditional independencies!!!Understanding independencies in BNs– BNs with 3 nodesZYXLocal Markov Assumption:A variable X is independentof its non-descendants given its parents Z YXZ YXZYXIndirect causal effect:Indirect evidential effect:Common cause:Common effect:Understanding independencies in BNs– Some examplesAHCEGDBFKJIAn active trail – ExampleA HCEGDBFF’’F’When are A and H independent?Active trails formalized A path X1 – X2 – · · · –Xkis an active trail when variables O⊆{X1,…,Xn} are observed if for each consecutive triplet in the trail: Xi-1→Xi→Xi+1, and Xiis not observed (Xi∉O) Xi-1←Xi←Xi+1, and Xiis not observed (Xi∉O) Xi-1←Xi→Xi+1, and Xiis not observed (Xi∉O) Xi-1→Xi←Xi+1, and Xiis observed (Xi∈O), or one of its descendentsActive trails and independence? Theorem: Variables Xiand Xjare independent given Z⊆{X1,…,Xn} if the is no active trail between Xiand Xjwhen variables Z⊆{X1,…,Xn} are observedAHCEGDBFKJIThe BN Representation TheoremIf joint probabilitydistribution:ObtainThen
View Full Document