Bayesian Networks Representation Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University March 20th 2006 Announcements Welcome back One page project proposal due Wednesday We ll go over midterm in this week s recitation Handwriting recognition Character recognition e g kernel SVMs rr r r r c r a z c bc Webpage classification Company home page vs Personal home page vs Univeristy home page vs Handwriting recognition 2 Webpage classification 2 Today Bayesian networks One of the most exciting advancements in statistical AI in the last 10 15 years Generalizes na ve Bayes and logistic regression classifiers Compact representation for exponentially large probability distributions Exploit conditional independencies Causal structure Suppose we know the following The flu causes sinus inflammation Allergies cause sinus inflammation Sinus inflammation causes a runny nose Sinus inflammation causes headaches How are these connected Possible queries Flu Inference Most probable explanation Active data collection Allergy Sinus Headache Nose Car starts BN 18 binary attributes Inference P BatteryAge Starts f 218 terms why so fast Not impressed HailFinder BN more than 354 58149737003040059690390169 terms Factored joint distribution Preview Flu Allergy Sinus Headache Nose Number of parameters Flu Allergy Sinus Headache Nose Key Independence assumptions Flu Allergy Sinus Headache Nose Knowing sinus separates the variables from each other Marginal Independence Flu and Allergy are marginally independent Flu t Flu f More Generally Allergy t Allergy f Flu t Allergy t Allergy f Flu f Marginally independent random variables Sets of variables X Y X is independent of Y if P X x Y y x Val X y Val Y Shorthand Marginal independence P X Y Proposition P statisfies X Y if and only if P X Y P X P Y Conditional independence Flu and Headache are not marginally independent Flu and Headache are independent given Sinus infection More Generally Conditionally independent random variables Sets of variables X Y Z X is independent of Y given Z if P X x Y y Z z x Val X y Val Y z Val Z Shorthand Conditional independence P X Y Z For P X Y write P X Y Proposition P statisfies X Y Z if and only if P X Y Z P X Z P Y Z Properties of independence Symmetry X Decomposition X Y W Z X Y Z W Contraction X Y W Z X Y Z Weak union X Y Z Y X Z W Y Z X Y Z X Y W Z Intersection Y W Z X W Y Z X Y W Z Only for positive distributions P 0 X The independence assumption Flu Allergy Sinus Headache Nose Local Markov Assumption A variable X is independent of its non descendants given its parents Explaining away Flu Allergy Sinus Headache Nose Local Markov Assumption A variable X is independent of its non descendants given its parents Na ve Bayes revisited Local Markov Assumption A variable X is independent of its non descendants given its parents What about probabilities Conditional probability tables CPTs Flu Allergy Sinus Headache Nose Joint distribution Flu Allergy Sinus Headache Nose Why can we decompose Markov Assumption The chain rule of probabilities P A B P A P B A Flu Sinus More generally P X1 Xn P X1 P X2 X1 P Xn X1 Xn 1 Chain rule Joint distribution Flu Allergy Sinus Headache Nose Local Markov Assumption A variable X is independent of its non descendants given its parents Two trivial special cases Edgeless graph Fully connected graph The Representation Theorem Joint Distribution to BN BN Encodes independence assumptions If conditional independencies Obtain in BN are subset of conditional independencies in P Joint probability distribution Real Bayesian networks applications Diagnosis of lymph node disease Speech recognition Microsoft office and Windows http www research microsoft com research dtg Study Human genome Robot mapping Robots to identify meteorites to study Modeling fMRI data Anomaly detection Fault dianosis Modeling sensor network data A general Bayes net Set of random variables Directed acyclic graph Encodes independence assumptions CPTs Joint distribution How many parameters in a BN Discrete variables X1 Xn Graph Defines parents of Xi PaXi CPTs P Xi PaXi Another example Variables B Burglar E Earthquake A Burglar alarm N Neighbor calls R Radio report Both burglars and earthquakes can set off the alarm If the alarm sounds a neighbor may call An earthquake may be announced on the radio Another example Building the BN B Burglar E Earthquake A Burglar alarm N Neighbor calls R Radio report Independencies encoded in BN We said All you need is the local Markov assumption Xi NonDescendantsXi PaXi But then we talked about other in dependencies e g explaining away What are the independencies encoded by a BN Only assumption is local Markov But many others can be derived using the algebra of conditional independencies Understanding independencies in BNs BNs with 3 nodes Local Markov Assumption Indirect causal effect X Z Y Indirect evidential effect X Z A variable X is independent of its non descendants given its parents Common effect Y X Common cause Z Z X Y Y Understanding independencies in BNs Some examples A B C E D G F H I J K An active trail Example A B C D G E F F F When are A and H independent H Active trails formalized A path X1 X2 Xk is an active trail when variables O X1 Xn are observed if for each consecutive triplet in the trail Xi 1 Xi Xi 1 and Xi is not observed Xi O Xi 1 Xi Xi 1 and Xi is not observed Xi O Xi 1 Xi Xi 1 and Xi is not observed Xi O Xi 1 Xi Xi 1 and Xi is observed Xi O or one of its descendents Active trails and independence Theorem Variables Xi and Xj are independent given Z X1 Xn if the is no active trail between Xi and Xj when variables Z X1 Xn are observed A B C E D G F H I J K The BN Representation Theorem If conditional independencies in BN are subset of conditional independencies in P Obtain Joint probability distribution Important because Every P has at least one BN structure G If joint probability distribution Obtain Then conditional independencies in BN are subset of conditional independencies in P Important because Read independencies of P from BN structure G Simpler BNs A distribution can be represented by many BNs Simpler BN requires fewer parameters Learning Bayes nets Known structure Unknown structure Fully observable data Missing data Data x 1 x m CPTs P Xi PaXi structure parameters Learning the CPTs Data x 1 x m For each discrete variable Xi Queries in Bayes nets Given BN find Probability of X given some evidence P X e Most probable explanation maxx1 xn P x1 xn e Most informative
View Full Document