Bayesian Networks Representation Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University March 16th 2005 Handwriting recognition Character recognition e g kernel SVMs rr r r r c r a c z bc Webpage classification Company home page vs Personal home page vs Univeristy home page vs Handwriting recognition 2 Webpage classification 2 Today Bayesian networks One of the most exciting advancements in statistical AI in the last 10 15 years Generalizes na ve Bayes and logistic regression classifiers Compact representation for exponentially large probability distributions Exploit conditional independencies Causal structure Suppose we know the following The flu causes sinus inflammation Allergies cause sinus inflammation Sinus inflammation causes a runny nose Sinus inflammation causes headaches How are these connected Possible queries Inference Flu Allergy Most probable explanation Sinus Headache Nose Active data collection Car starts BN 18 binary attributes Inference P BatteryAge Starts f 218 terms why so fast Not impressed HailFinder BN more than 354 58149737003040059690390169 terms Factored joint distribution Preview Flu Allergy Sinus Headache Nose Number of parameters Flu Allergy Sinus Headache Nose Key Independence assumptions Flu Allergy Sinus Headache Nose Knowing sinus separates the variables from each other Marginal Independence Flu and Allergy are marginally independent Flu t Flu f More Generally Allergy t Allergy f Flu t Allergy t Allergy f Flu f Conditional independence Flu and Headache are not marginally independent Flu and Headache are independent given Sinus infection More Generally The independence assumption Flu Allergy Sinus Headache Nose Local Markov Assumption A variable X is independent of its non descendants given its parents Explaining away Flu Allergy Sinus Headache Nose Local Markov Assumption A variable X is independent of its non descendants given its parents Na ve Bayes revisited Local Markov Assumption A variable X is independent of its non descendants given its parents What about probabilities Conditional probability tables CPTs Flu Allergy Sinus Headache Nose Joint distribution Flu Allergy Sinus Headache Nose Why can we decompose Markov Assumption Real Bayesian networks applications Diagnosis of lymph node disease Speech recognition Microsoft office and Windows http www research microsoft com research dtg Study Human genome Robot mapping Robots to identify meteorites to study Modeling fMRI data Anomaly detection Fault dianosis Modeling sensor network data A general Bayes net Set of random variables Directed acyclic graph Encodes independence assumptions CPTs Joint distribution Another example Variables B Burglar E Earthquake A Burglar alarm N Neighbor calls R Radio report Both burglars and earthquakes can set off the alarm If the alarm sounds a neighbor may call An earthquake may be announced on the radio Another example Building the BN B Burglar E Earthquake A Burglar alarm N Neighbor calls R Radio report Defining a BN Given a set of variables and conditional independence assumptions Choose an ordering on variables e g X1 Xn For i 1 to n Add Xi to the network Define parents of Xi PaXi in graph as the minimal subset of X1 Xi 1 such that local Markov assumption holds Xi independent of rest of X1 Xi 1 given parents PaXi Define learn CPT P Xi PaXi How many parameters in a BN Discrete variables X1 Xn Graph Defines parents of Xi PaXi CPTs P Xi PaXi Defining a BN 2 We may not know conditional independence assumptions and even variables Given a set of variables and conditional independence assumptions Choose an ordering on variables e g X1 Xn For i 1 to n There are good orderings and bad Add Xi to the network ones A bad ordering may need parents per variable must Define parents of Xi Pamore in graph as the minimal Xi learn more parameters subset of X1 Xi 1 such that local Markov assumption holds Xi independent of rest of X1 Xi 1 given parents PaXi Define learn CPT P Xi PaXi How Learning the CPTs Data x 1 x m For each discrete variable Xi Learning Bayes nets Known structure Fully observable data Missing data Unknown structure Queries in Bayes nets Given BN find Probability of X given some evidence P X e Most probable explanation maxx1 xn P x1 xn e Most informative query Learn more about these next class What you need to know Bayesian networks A compact representation for large probability distributions Not an algorithm Semantics of a BN Conditional independence assumptions Representation Variables Graph CPTs Why BNs are useful Learning CPTs from fully observable data Play with applet Acknowledgements JavaBayes applet http www pmr poli usp br ltd Software javabayes Ho me index html
View Full Document