Bayesian Networks Representation Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University March 19th 2007 2005 2007 Carlos Guestrin Handwriting recognition Character recognition e g kernel SVMs rr r r r c r a c z bc 2005 2007 Carlos Guestrin Webpage classification Company home page vs Personal home page vs University home page vs 2005 2007 Carlos Guestrin Handwriting recognition 2 2005 2007 Carlos Guestrin Webpage classification 2 2005 2007 Carlos Guestrin Today Bayesian networks One of the most exciting advancements in statistical AI in the last 10 15 years Generalizes na ve Bayes and logistic regression classifiers Compact representation for exponentially large probability distributions Exploit conditional independencies 2005 2007 Carlos Guestrin Causal structure Suppose we know the following The flu causes sinus inflammation Allergies cause sinus inflammation Sinus inflammation causes a runny nose Sinus inflammation causes headaches How are these connected 2005 2007 Carlos Guestrin Possible queries Inference Flu Allergy Most probable explanation Sinus Headache Nose 2005 2007 Carlos Guestrin Active data collection Car starts BN 18 binary attributes Inference P BatteryAge Starts f 216 terms why so fast Not impressed HailFinder BN more than 354 58149737003040059690390169 terms 2005 2007 Carlos Guestrin Announcements Welcome back One page project proposal due Wednesday Individual or groups of two Must be something related to ML It will be great if it s related to your research it must be something you started this semester Midway progress report 5 pages NIPS format April 16th Worth 20 Poster presentation May 4 2 5pm in the NSH Atrium Worth 20 Final report May 10th 8 pages NIPS format Worth 60 It will be fun 2005 2007 Carlos Guestrin And the winner is In third place Samuel Clanton Raja Sambasivan Hui Yang 76 0 accuracy 25 mistakes In second place Minh Nguyen 76 9 accuracy 24 mistakes And in first place Jason Ganetsky 78 8 accuracy 22 mistakes 2005 2007 Carlos Guestrin Factored joint distribution Preview Flu Allergy Sinus Headache Nose 2005 2007 Carlos Guestrin Number of parameters Flu Allergy Sinus Headache Nose 2005 2007 Carlos Guestrin Key Independence assumptions Flu Allergy Sinus Headache Nose Knowing sinus separates the variables from each other 2005 2007 Carlos Guestrin Marginal Independence Flu and Allergy are marginally independent Flu t Flu f More Generally Allergy t Allergy f Flu t Allergy t Allergy f 2005 2007 Carlos Guestrin Flu f Marginally independent random variables Sets of variables X Y X is independent of Y if P X x Y y x Val X y Val Y Shorthand Marginal independence P X Y Proposition P statisfies X Y if and only if P X Y P X P Y 2005 2007 Carlos Guestrin Conditional independence Flu and Headache are not marginally independent Flu and Headache are independent given Sinus infection More Generally 2005 2007 Carlos Guestrin Conditionally independent random variables Sets of variables X Y Z X is independent of Y given Z if P X x Y y Z z x Val X y Val Y z Val Z Shorthand Conditional independence P X Y Z For P X Y write P X Y Proposition P statisfies X Y Z if and only if P X Y Z P X Z P Y Z 2005 2007 Carlos Guestrin Properties of independence Symmetry X Y Z Y X Z Decomposition X Y W Z X Y Z Weak union X Y W Z X Y Z W Contraction X W Y Z X Y Z X Y W Z Intersection X Y W Z X W Y Z X Y W Z Only for positive distributions P 0 2005 2007 Carlos Guestrin The independence assumption Flu Allergy Sinus Headache Nose Local Markov Assumption A variable X is independent of its non descendants given its parents 2005 2007 Carlos Guestrin Explaining away Flu Local Markov Assumption A variable X is independent of its non descendants given its parents Allergy Sinus Headache Nose 2005 2007 Carlos Guestrin Na ve Bayes revisited Local Markov Assumption A variable X is independent of its non descendants given its parents 2005 2007 Carlos Guestrin What about probabilities Conditional probability tables CPTs Flu Allergy Sinus Nose Headache 2005 2007 Carlos Guestrin Joint distribution Flu Allergy Sinus Headache Nose Why can we decompose Markov Assumption 2005 2007 Carlos Guestrin The chain rule of probabilities P A B P A P B A Flu Sinus More generally P X1 Xn P X1 P X2 X1 P Xn X1 Xn 1 2005 2007 Carlos Guestrin Chain rule Joint distribution Flu Allergy Local Markov Assumption A variable X is independent of its non descendants given its parents Sinus Headache Nose 2005 2007 Carlos Guestrin Two trivial special cases Edgeless graph Fully connected graph 2005 2007 Carlos Guestrin The Representation Theorem Joint Distribution to BN BN Encodes independence assumptions If conditional independencies Obtain in BN are subset of conditional independencies in P 2005 2007 Carlos Guestrin Joint probability distribution Real Bayesian networks applications Diagnosis of lymph node disease Speech recognition Microsoft office and Windows http www research microsoft com research dtg Study Human genome Robot mapping Robots to identify meteorites to study Modeling fMRI data Anomaly detection Fault dianosis Modeling sensor network data 2005 2007 Carlos Guestrin A general Bayes net Set of random variables Directed acyclic graph Encodes independence assumptions CPTs Joint distribution 2005 2007 Carlos Guestrin How many parameters in a BN Discrete variables X1 Xn Graph Defines parents of Xi PaXi CPTs P Xi PaXi 2005 2007 Carlos Guestrin Another example Variables B Burglar E Earthquake A Burglar alarm N Neighbor calls R Radio report Both burglars and earthquakes can set off the alarm If the alarm sounds a neighbor may call An earthquake may be announced on the radio 2005 2007 Carlos Guestrin Another example Building the BN B Burglar E Earthquake A Burglar alarm N Neighbor calls R Radio report 2005 2007 Carlos Guestrin Independencies encoded in BN We said All you need is the local Markov assumption Xi NonDescendantsXi PaXi But then we talked about other in dependencies e g explaining away What are the independencies encoded by a BN Only assumption is local Markov But many others can be derived using the algebra of conditional independencies 2005 2007 Carlos Guestrin Understanding independencies in BNs BNs with 3 nodes Local Markov Assumption Indirect causal effect X Z A variable X is independent of its non descendants given its parents Y Indirect evidential effect X Z Common effect Y X Common cause Z Z X Y Y 2005 2007 Carlos Guestrin Understanding independencies in BNs Some examples A B C E D G F H I J K
View Full Document