©2005-2007 Carlos GuestrinBayesian Networks –Representation Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityMarch 19th, 2007©2005-2007 Carlos GuestrinHandwriting recognitionCharacter recognition, e.g., kernel SVMszcbcacrrrrrr©2005-2007 Carlos GuestrinWebpage classificationCompany home pagevsPersonal home pagevsUniversity home pagevs…©2005-2007 Carlos GuestrinHandwriting recognition 2©2005-2007 Carlos GuestrinWebpage classification 2©2005-2007 Carlos GuestrinToday – Bayesian networks One of the most exciting advancements in statistical AI in the last 10-15 years Generalizes naïve Bayes and logistic regression classifiers Compact representation for exponentially-large probability distributions Exploit conditional independencies©2005-2007 Carlos GuestrinCausal structure Suppose we know the following: The flu causes sinus inflammation Allergies cause sinus inflammation Sinus inflammation causes a runny nose Sinus inflammation causes headaches How are these connected?©2005-2007 Carlos GuestrinPossible queriesFluAllergySinusHeadacheNose Inference Most probable explanation Active data collection©2005-2007 Carlos GuestrinCar starts BN 18 binary attributes Inference P(BatteryAge|Starts=f) 216terms, why so fast? Not impressed? HailFinder BN – more than 354= 58149737003040059690390169 terms©2005-2007 Carlos GuestrinAnnouncements Welcome back! One page project proposal due Wednesday Individual or groups of two Must be something related to ML! ☺ It will be great if it’s related to your research → it must be something you started this semester Midway progress report 5 pages NIPS format April 16th Worth 20% Poster presentation May 4, 2-5pm in the NSH Atrium Worth 20% Final report May 10th 8 pages NIPS format Worth 60% It will be fun!!! ☺©2005-2007 Carlos GuestrinAnd the winner is… In third place… Samuel Clanton Raja Sambasivan Hui Yang 76.0% accuracy (25 mistakes) In second place… Minh Nguyen 76.9% accuracy (24 mistakes) And in first place… Jason Ganetsky 78.8% accuracy (22 mistakes)©2005-2007 Carlos GuestrinFactored joint distribution -PreviewFluAllergySinusHeadacheNose©2005-2007 Carlos GuestrinNumber of parametersFluAllergySinusHeadacheNose©2005-2007 Carlos GuestrinKey: Independence assumptionsFluAllergySinusHeadacheNoseKnowing sinus separates the variables from each other©2005-2007 Carlos Guestrin(Marginal) Independence Flu and Allergy are (marginally) independent More Generally:Allergy = fAllergy = tFlu = fFlu = tAllergy = fAllergy = tFlu = fFlu = t©2005-2007 Carlos GuestrinMarginally independent random variables Sets of variables X, Y X is independent of Y if P ²(X=x⊥Y=y), ∀ x∈Val(X), y∈Val(Y) Shorthand: Marginal independence: P ² (X ⊥ Y) Proposition: P statisfies (X ⊥ Y) if and only if P(X,Y) = P(X) P(Y)©2005-2007 Carlos GuestrinConditional independence Flu and Headache are not (marginally) independent Flu and Headache are independent given Sinus infection More Generally:©2005-2007 Carlos GuestrinConditionally independent random variables Sets of variables X, Y, Z X is independent of Y given Z if P ²(X=x ⊥ Y=y|Z=z), ∀ x∈Val(X), y∈Val(Y), z∈Val(Z) Shorthand: Conditional independence: P ² (X ⊥ Y | Z) For P ² (X ⊥ Y | ∅), write P ² (X ⊥ Y) Proposition: P statisfies (X ⊥ Y | Z) if and only if P(X,Y|Z) = P(X|Z) P(Y|Z)©2005-2007 Carlos GuestrinProperties of independence Symmetry: (X ⊥ Y | Z) ⇒ (Y ⊥ X | Z) Decomposition: (X ⊥ Y,W | Z) ⇒ (X ⊥ Y | Z) Weak union: (X ⊥ Y,W | Z) ⇒ (X ⊥ Y | Z,W) Contraction: (X ⊥ W | Y,Z)& (X ⊥ Y | Z) ⇒ (X ⊥ Y,W | Z) Intersection: (X ⊥ Y | W,Z)& (X ⊥ W | Y,Z) ⇒ (X ⊥ Y,W | Z) Only for positive distributions! P(α)>0, ∀α, α≠∅©2005-2007 Carlos GuestrinThe independence assumption FluAllergySinusHeadacheNoseLocal Markov Assumption:A variable X is independentof its non-descendants given its parents©2005-2007 Carlos GuestrinExplaining awayFluAllergySinusHeadacheNoseLocal Markov Assumption:A variable X is independentof its non-descendants given its parents©2005-2007 Carlos GuestrinNaïve Bayes revisitedLocal Markov Assumption:A variable X is independentof its non-descendants given its parents©2005-2007 Carlos GuestrinWhat about probabilities?Conditional probability tables (CPTs)FluAllergySinusHeadacheNose©2005-2007 Carlos GuestrinJoint distributionFluAllergySinusHeadacheNoseWhy can we decompose? Markov Assumption!©2005-2007 Carlos GuestrinThe chain rule of probabilities P(A,B) = P(A)P(B|A) More generally: P(X1,…,Xn) = P(X1) · P(X2|X1) · … · P(Xn|X1,…,Xn-1)FluSinus©2005-2007 Carlos GuestrinChain rule & Joint distributionFluAllergySinusHeadacheNoseLocal Markov Assumption:A variable X is independentof its non-descendants given its parents©2005-2007 Carlos GuestrinTwo (trivial) special casesEdgeless graph Fully-connected graph©2005-2007 Carlos GuestrinThe Representation Theorem –Joint Distribution to BNJoint probabilitydistribution:ObtainBN:Encodes independenceassumptionsIf conditionalindependenciesin BN are subset of conditional independencies in P©2005-2007 Carlos GuestrinReal Bayesian networks applications Diagnosis of lymph node disease Speech recognition Microsoft office and Windows http://www.research.microsoft.com/research/dtg/ Study Human genome Robot mapping Robots to identify meteorites to study Modeling fMRI data Anomaly detection Fault dianosis Modeling sensor network data©2005-2007 Carlos GuestrinA general Bayes net Set of random variables Directed acyclic graph Encodes independence assumptions CPTs Joint distribution:©2005-2007 Carlos GuestrinHow many parameters in a BN? Discrete variables X1, …, Xn Graph Defines parents of Xi, PaXi CPTs – P(Xi| PaXi)©2005-2007 Carlos GuestrinAnother example Variables: B – Burglar E – Earthquake A – Burglar alarm N – Neighbor calls R – Radio report Both burglars and earthquakes can set off the alarm If the alarm sounds, a neighbor may call An earthquake may be announced on the radio©2005-2007 Carlos GuestrinAnother example – Building the BN B – Burglar E – Earthquake A – Burglar alarm N –
View Full Document