Learning BN tutorial ftp ftp research microsoft com pub tr tr 95 06 pdf TAN paper http www cs huji ac il nir Abstracts FrGG1 html Bayesian Networks Inference continued Learning Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University March 23rd 2005 Review Bayesian Networks Compact representation for probability distributions Exponential reduction in number of parameters Fast probabilistic inference using variable elimination Compute P X e Time exponential in tree width not number of variables Today Finding most likely explanation Using sampling for approximate inference Learn BN structure Flu Allergy Sinus Headache Nose Most likely explanation MLE Flu Query Sinus Headache Using Bayes rule Normalization irrelevant Allergy Nose Max marginalization Flu Allergy t Sinus Example of variable elimination for MLE Forward pass Flu Allergy Sinus Headache Nose t Example of variable elimination for MLE Backward pass Flu Allergy Sinus Headache Nose t MLE Variable elimination algorithm Forward pass Given a BN and a MLE query maxx1 xnP x1 xn e Instantiate evidence e Choose an ordering on variables e g X1 Xn For i 1 to n If Xi e Collect factors f1 fk that include Xi Generate a new factor by eliminating Xi from these factors Variable Xi has been eliminated MLE Variable elimination algorithm Backward pass x1 xn will store maximizing assignment For i n to 1 If Xi e Take factors f1 fk used when Xi was eliminated Instantiate f1 fk with xi 1 xn Now each fj depends only on Xi Generate maximizing assignment for Xi Stochastic simulation Obtaining a sample from the joint distribution Flu Allergy Sinus Headache Nose Using stochastic simulation sampling to compute P X Given a BN a query P X and Flu number of samples m Choose a topological ordering on variables e g X1 Xn Headache For j 1 to m x1j xnj will be jth sample For i 1 to n Sample xij from the distribution P Xi PaX where parents are i instantiated to x1j xi 1j Add x1j xnj to dataset Use counts to compute P X Allergy Sinus Nose Example of using rejection sampling to compute P X e Flu Allergy Sinus Headache Nose t Using rejection sampling to compute P X e Given a BN a query P X e and Flu number of samples m Choose a topological ordering on variables e g X1 Xn j 0 Headache While j m x1j xnj will be jth sample For i 1 to n Sample xij from the distribution P Xi PaX i where parents are instantiated to x1j xi 1j If x1j xnj consistent with evidence add it to dataset and j j 1 Use counts to compute P X e Allergy Sinus Nose t Example of using importance sampling to compute P X e Flu Allergy Sinus Headache Nose t Using importance sampling to Flu compute P X e For j 1 to m x1j xnj will be jth sample Initialize weight of sample wj 1 For i 1 to n Sinus Headache If Xi e Sample xij from the distribution P Xi PaX i where parents are instantiated to x1j xi 1j else Set xij to assignment in evidence e Multiply weight wj by P xij PaX where parents are i instantiated to x1j xi 1j Add x1j xnj to dataset with weight wj Allergy Use weighted counts to compute P X e Nose t What you need to know about inference Bayesian networks A useful compact representation for large probability distributions Inference to compute Probability of X given evidence e Most likely explanation MLE given evidence e Inference is NP hard Variable elimination algorithm Efficient algorithm only exponential in tree width not number of variables Elimination order is important Approximate inference necessary when tree width to large Only difference between probabilistic inference and MLE is sum versus max Sampling Example of approximate inference Simulate from model Likelihood weighting for inference Can be very slow Where are we Bayesian networks Represent exponentially large probability distributions compactly Inference in BNs Exact inference very fast for problems with low treewidth Many approximate inference algorithms e g sampling Learning BNs Given structure estimate parameters Using counts maximum likelihood MAP is also possible What about learning structure Maximum likelihood ML for learning BN structure Possible structures Flu Allergy Sinus Headache Data x 1 1 x n 1 x 1 m x n m Nose Learn parameters using ML Score structure How many graphs are there How many trees are there Nonetheless Efficient optimal algorithm finds best tree Scoring a tree Chow Liu tree learning algorithm For each pair of variables Xi Xj Compute empirical distribution Compute mutual information Define a graph Nodes X1 Xn Edge i j gets weight Optimal tree BN Compute maximal spanning tree Directions in BN pick any node as root breadth first search defines directions Can we extend Chow Liu Tree augmented na ve Bayes TAN Friedman et al 97 Same as Chow Liu but score edges with Approximate learning models with tree width up to k Narasimhan Bilmes 04 But O nk 1 Scoring general graphical models Model selection problem What s the best structure Flu Allergy Sinus Headache Nose Data x 1 1 x n 1 x 1 m x n m The more edges the fewer independence assumptions the higher the likelihood of the data but will overfit Bayesian score Given a structure distribution over parameters Difficult integral use Bayes information criterion BIC approximation Note regularize with MDL score Best BN under BIC still NP hard Learn BN structure using local search Starting from Chow Liu tree Local search possible moves Add edge Delete edge Invert edge Score using BIC What you need to know about learning BNs Bayesian networks A useful compact representation for large probability distributions Inference Variable elimination algorithm Sampling Example of approximate inference Learning BNs Maximum likelihood or MAP learns parameters Best tree Chow Liu Best TAN Other BNs usually local search with BIC score Acknowledgements JavaBayes applet http www pmr poli usp br ltd Software javabayes Ho me index html
View Full Document