Bayesian Networks – RepresentationHandwriting recognitionWebpage classificationHandwriting recognition 2Webpage classification 2Today – Bayesian networksCausal structurePossible queriesCar starts BNFactored joint distribution - PreviewNumber of parametersKey: Independence assumptions(Marginal) IndependenceConditional independenceThe independence assumptionExplaining awayNaïve Bayes revisitedWhat about probabilities?Conditional probability tables (CPTs)Joint distributionReal Bayesian networks applicationsA general Bayes netAnother exampleAnother example – Building the BNDefining a BNHow many parameters in a BN?Defining a BN 2Learning the CPTsLearning Bayes netsQueries in Bayes netsWhat you need to knowAcknowledgementsBayesian Networks –Representation Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityMarch 16th, 2005Handwriting recognitionCharacter recognition, e.g., kernel SVMszcbcacrrrrrrWebpage classificationCompany home pagevsPersonal home pagevsUniveristy home pagevs…Handwriting recognition 2Webpage classification 2Today – Bayesian networks One of the most exciting advancements in statistical AI in the last 10-15 years Generalizes naïve Bayes and logistic regression classifiers Compact representation for exponentially-large probability distributions Exploit conditional independenciesCausal structure Suppose we know the following: The flu causes sinus inflammation Allergies cause sinus inflammation Sinus inflammation causes a runny nose Sinus inflammation causes headaches How are these connected?Possible queries Inference Most probable explanation Active data collectionFluAllergySinusHeadacheNoseCar starts BN 18 binary attributes Inference P(BatteryAge|Starts=f) 218terms, why so fast? Not impressed? HailFinder BN – more than 354= 58149737003040059690390169 termsFactored joint distribution -PreviewFluAllergySinusHeadacheNoseNumber of parametersFluAllergySinusHeadacheNoseKey: Independence assumptionsFluAllergySinusHeadacheNoseKnowing sinus separates the variables from each other(Marginal) Independence Flu and Allergy are (marginally) independent More Generally:Flu = tFlu = fAllergy = tAllergy = fFlu = t Flu = fAllergy = tAllergy = fConditional independence Flu and Headache are not (marginally) independent Flu and Headache are independent given Sinus infection More Generally:The independence assumption FluAllergySinusHeadacheNoseLocal Markov Assumption:A variable X is independentof its non-descendants given its parentsExplaining awayFluAllergySinusHeadacheNoseLocal Markov Assumption:A variable X is independentof its non-descendants given its parentsNaïve Bayes revisitedLocal Markov Assumption:A variable X is independentof its non-descendants given its parentsWhat about probabilities?Conditional probability tables (CPTs)FluAllergySinusHeadacheNoseJoint distributionFluAllergySinusHeadacheNoseWhy can we decompose? Markov Assumption!Real Bayesian networks applications Diagnosis of lymph node disease Speech recognition Microsoft office and Windows http://www.research.microsoft.com/research/dtg/ Study Human genome Robot mapping Robots to identify meteorites to study Modeling fMRI data Anomaly detection Fault dianosis Modeling sensor network dataA general Bayes net Set of random variables Directed acyclic graph Encodes independence assumptions CPTs Joint distribution:Another example Variables: B – Burglar E – Earthquake A – Burglar alarm N – Neighbor calls R – Radio report Both burglars and earthquakes can set off the alarm If the alarm sounds, a neighbor may call An earthquake may be announced on the radioAnother example – Building the BN B – Burglar E – Earthquake A – Burglar alarm N – Neighbor calls R – Radio reportDefining a BN Given a set of variables and conditional independence assumptions Choose an ordering on variables, e.g., X1, …, Xn For i = 1 to n Add Xito the network Define parents of Xi, PaXi, in graph as the minimal subset of {X1,…,Xi-1} such that local Markov assumption holds – Xiindependent of rest of {X1,…,Xi-1}, given parents PaXi Define/learn CPT – P(Xi| PaXi)How many parameters in a BN? Discrete variables X1, …, Xn Graph Defines parents of Xi, PaXi CPTs – P(Xi| PaXi)Defining a BN 2 Given a set of variables and conditional independence assumptions Choose an ordering on variables, e.g., X1, …, Xn For i = 1 to n Add Xito the network Define parents of Xi, PaXi, in graph as the minimal subset of {X1,…,Xi-1} such that local Markov assumption holds – Xiindependent of rest of {X1,…,Xi-1}, given parents PaXi Define/learn CPT – P(Xi| PaXi)We may not know conditional independence assumptions and even variablesThere are good orderings and bad ones – A bad ordering may need more parents per variable → must learn more parametersHow???Learning the CPTsx(1)…x(m)DataFor each discrete variable XiLearning Bayes netsKnown structure Unknown structureFully observable dataMissing dataQueries in Bayes nets Given BN, find: Probability of X given some evidence, P(X|e) Most probable explanation, maxx1,…,xnP(x1,…,xn| e) Most informative query Learn more about these next classWhat you need to know Bayesian networks A compact representation for large probability distributions Not an algorithm Semantics of a BN Conditional independence assumptions Representation Variables Graph CPTs Why BNs are useful Learning CPTs from fully observable data Play with applet!!! ☺Acknowledgements JavaBayes applet
View Full Document