2005-2007 Carlos GuestrinBayesian Networks –Representation Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityMarch 19th, 20072005-2007 Carlos GuestrinHandwriting recognitionCharacter recognition, e.g., kernel SVMszcbcacrrrrrr2005-2007 Carlos GuestrinWebpage classificationCompany home pagevsPersonal home pagevsUniversity home pagevs…2005-2007 Carlos GuestrinHandwriting recognition 22005-2007 Carlos GuestrinWebpage classification 22005-2007 Carlos GuestrinToday – Bayesian networks One of the most exciting advancements in statistical AI in the last 10-15 years Generalizes naïve Bayes and logistic regression classifiers Compact representation for exponentially-large probability distributions Exploit conditional independencies2005-2007 Carlos GuestrinCausal structure Suppose we know the following: The flu causes sinus inflammation Allergies cause sinus inflammation Sinus inflammation causes a runny nose Sinus inflammation causes headaches How are these connected?2005-2007 Carlos GuestrinPossible queriesFluAllergySinusHeadacheNose Inference Most probable explanation Active data collection2005-2007 Carlos GuestrinCar starts BN 18 binary attributes Inference P(BatteryAge|Starts=f) 216terms, why so fast? Not impressed? HailFinder BN – more than 354= 58149737003040059690390169 terms2005-2007 Carlos GuestrinAnnouncements Welcome back! One page project proposal due Wednesday Individual or groups of two Must be something related to ML! ☺ It will be great if it’s related to your research → it must be something you started this semester Midway progress report 5 pages NIPS format April 16th Worth 20% Poster presentation May 4, 2-5pm in the NSH Atrium Worth 20% Final report May 10th 8 pages NIPS format Worth 60% It will be fun!!! ☺2005-2007 Carlos GuestrinFactored joint distribution -PreviewFluAllergySinusHeadacheNose2005-2007 Carlos GuestrinNumber of parametersFluAllergySinusHeadacheNose2005-2007 Carlos GuestrinKey: Independence assumptionsFluAllergySinusHeadacheNoseKnowing sinus separates the variables from each other2005-2007 Carlos Guestrin(Marginal) Independence Flu and Allergy are (marginally) independent More Generally:Allergy = fAllergy = tFlu = fFlu = tAllergy = fAllergy = tFlu = fFlu = t2005-2007 Carlos GuestrinMarginally independent random variables Sets of variables X, Y X is independent of Y if P (X=x⊥Y=y), ∀ x∈∈∈∈Val(X), y∈∈∈∈Val(Y) Shorthand: Marginal independence: P (X ⊥ Y) Proposition: P statisfies (X ⊥ Y) if and only if P(X,Y) = P(X) P(Y)2005-2007 Carlos GuestrinConditional independence Flu and Headache are not (marginally) independent Flu and Headache are independent given Sinus infection More Generally:2005-2007 Carlos GuestrinConditionally independent random variables Sets of variables X, Y, Z X is independent of Y given Z if P (X=x ⊥ Y=y|Z=z), ∀ x∈∈∈∈Val(X), y∈∈∈∈Val(Y), z∈∈∈∈Val(Z) Shorthand: Conditional independence: P (X ⊥ Y | Z) For P (X ⊥ Y | ∅), write P (X ⊥ Y) Proposition: P statisfies (X ⊥ Y | Z) if and only if P(X,Y|Z) = P(X|Z) P(Y|Z)2005-2007 Carlos GuestrinProperties of independence Symmetry: (X ⊥ Y | Z) ⇒ (Y ⊥ X | Z) Decomposition: (X ⊥ Y,W | Z) ⇒ (X ⊥ Y | Z) Weak union: (X ⊥ Y,W | Z) ⇒ (X ⊥ Y | Z,W) Contraction: (X ⊥ W | Y,Z) & (X ⊥ Y | Z) ⇒ (X ⊥ Y,W | Z) Intersection: (X ⊥ Y | W,Z) & (X ⊥ W | Y,Z) ⇒ (X ⊥ Y,W | Z) Only for positive distributions! P(α)>0, ∀α, α≠∅2005-2007 Carlos GuestrinThe independence assumption FluAllergySinusHeadacheNoseLocal Markov Assumption:A variable X is independentof its non-descendants given its parents2005-2007 Carlos GuestrinExplaining awayFluAllergySinusHeadacheNoseLocal Markov Assumption:A variable X is independentof its non-descendants given its parents2005-2007 Carlos GuestrinNaïve Bayes revisitedLocal Markov Assumption:A variable X is independentof its non-descendants given its parents2005-2007 Carlos GuestrinWhat about probabilities?Conditional probability tables (CPTs)FluAllergySinusHeadacheNose2005-2007 Carlos GuestrinJoint distributionFluAllergySinusHeadacheNoseWhy can we decompose? Markov Assumption!2005-2007 Carlos GuestrinThe chain rule of probabilities P(A,B) = P(A)P(B|A) More generally: P(X1,…,Xn) = P(X1) · P(X2|X1) · … · P(Xn|X1,…,Xn-1)FluSinus2005-2007 Carlos GuestrinChain rule & Joint distributionFluAllergySinusHeadacheNoseLocal Markov Assumption:A variable X is independentof its non-descendants given its parents2005-2007 Carlos GuestrinTwo (trivial) special casesEdgeless graph Fully-connected graph2005-2007 Carlos GuestrinThe Representation Theorem –Joint Distribution to BNJoint probabilitydistribution:ObtainBN:Encodes independenceassumptionsIf conditionalindependenciesin BN are subset of conditional independencies in P2005-2007 Carlos GuestrinReal Bayesian networks applications Diagnosis of lymph node disease Speech recognition Microsoft office and Windows http://www.research.microsoft.com/research/dtg/ Study Human genome Robot mapping Robots to identify meteorites to study Modeling fMRI data Anomaly detection Fault dianosis Modeling sensor network data2005-2007 Carlos GuestrinA general Bayes net Set of random variables Directed acyclic graph Encodes independence assumptions CPTs Joint distribution:2005-2007 Carlos GuestrinHow many parameters in a BN? Discrete variables X1, …, Xn Graph Defines parents of Xi, PaXi CPTs – P(Xi| PaXi)2005-2007 Carlos GuestrinAnother example Variables: B – Burglar E – Earthquake A – Burglar alarm N – Neighbor calls R – Radio report Both burglars and earthquakes can set off the alarm If the alarm sounds, a neighbor may call An earthquake may be announced on the radio2005-2007 Carlos GuestrinAnother example – Building the BN B – Burglar E – Earthquake A – Burglar alarm N – Neighbor calls R – Radio report2005-2007 Carlos GuestrinIndependencies encoded in BN We said: All you need is the local Markov assumption (Xi⊥ NonDescendantsXi| PaXi) But then we talked about
View Full Document