Bayesian Networks – Inference (continued) LearningReviewMost likely explanation (MLE)Max-marginalizationExample of variable elimination for MLE – Forward passExample of variable elimination for MLE – Backward passMLE Variable elimination algorithm – Forward passMLE Variable elimination algorithm – Backward passStochastic simulation – Obtaining a sample from the joint distributionUsing stochastic simulation (sampling) to compute P(X)Example of using rejection sampling to compute P(X|e)Using rejection sampling to compute P(X|e)Example of using importance sampling to compute P(X|e)Using importance sampling to compute P(X|e)What you need to know about inferenceWhere are we?Maximum likelihood (ML) for learning BN structureHow many graphs are there?How many trees are there?Scoring a treeChow-Liu tree learning algorithmCan we extend Chow-LiuScoring general graphical models – Model selection problemBayesian scoreLearn BN structure using local searchWhat you need to know about learning BNsAcknowledgementsLearning BN tutorial:ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdfTAN paper:http://www.cs.huji.ac.il/~nir/Abstracts/FrGG1.htmlBayesian Networks –Inference (continued) Learning Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityMarch 23rd, 2005Review Bayesian Networks Compact representation for probability distributions Exponential reduction in number of parameters Fast probabilistic inference using variable elimination Compute P(X|e) Time exponential in tree-width, not number of variables Today Finding most likely explanation Using sampling for approximate inference Learn BN structureFluAllergySinusHeadacheNoseMost likely explanation (MLE) Query: Using Bayes rule: Normalization irrelevant:FluAllergySinusHeadacheNoseMax-marginalizationFluAllergy=tSinusExample of variable elimination for MLE – Forward passFluAllergySinusHeadacheNose=tExample of variable elimination for MLE – Backward passFluAllergySinusHeadacheNose=tMLE Variable elimination algorithm – Forward pass Given a BN and a MLE query maxx1,…,xnP(x1,…,xn,e) Instantiate evidence e Choose an ordering on variables, e.g., X1, …, Xn For i = 1 to n, If Xi∉{e} Collect factors f1,…,fkthat include Xi Generate a new factor by eliminating Xifrom these factors Variable Xihas been eliminated!MLE Variable elimination algorithm – Backward pass {x1*,…, xn*} will store maximizing assignment For i = n to 1, If Xi∉{e} Take factors f1,…,fkused when Xiwas eliminated Instantiate f1,…,fk, with {xi+1*,…, xn*} Now each fjdepends only on Xi Generate maximizing assignment for Xi:Stochastic simulation – Obtaining a sample from the joint distributionFluAllergySinusHeadacheNoseUsing stochastic simulation (sampling) to compute P(X)FluAllergySinusHeadacheNose Given a BN, a query P(X), and number of samples m Choose a topological ordering on variables, e.g., X1, …, Xn For j = 1 to m {x1j,…, xnj} will be jthsample For i = 1 to n Sample xijfrom the distribution P(Xi|PaXi), where parents are instantiated to {x1j,…, xi-1j} Add {x1j,…, xnj} to “dataset” Use counts to compute P(X)Example of using rejection sampling to compute P(X|e)FluAllergySinusHeadacheNose = tUsing rejection sampling to compute P(X|e)FluAllergySinusHeadacheNose = t Given a BN, a query P(X|e), and number of samples m Choose a topological ordering on variables, e.g., X1, …, Xn j = 0 While j < m {x1j,…, xnj} will be jthsample For i = 1 to n Sample xijfrom the distribution P(Xi|PaXi), where parents are instantiated to {x1j,…, xi-1j} If {x1j,…, xnj} consistent with evidence, add it to “dataset” and j = j +1 Use counts to compute P(X|e)Example of using importance sampling to compute P(X|e)FluAllergySinusHeadacheNose = tUsing importance sampling to compute P(X|e)FluAllergySinusHeadacheNose = t For j = 1 to m {x1j,…, xnj} will be jthsample Initialize weight of sample wj= 1 For i = 1 to n If Xi∉ {e} Sample xijfrom the distribution P(Xi|PaXi), where parents are instantiated to {x1j,…, xi-1j} else Set xijto assignment in evidence e Multiply weight wjby P(xij|PaXi), where parents are instantiated to {x1j,…, xi-1j} Add {x1j,…, xnj} to “dataset” with weight wj Use weighted counts to compute P(X|e)What you need to know about inference Bayesian networks A useful compact representation for large probability distributions Inference to compute Probability of X given evidence e Most likely explanation (MLE) given evidence e Inference is NP-hard Variable elimination algorithm Efficient algorithm (“only” exponential in tree-width, not number of variables) Elimination order is important! Approximate inference necessary when tree-width to large Only difference between probabilistic inference and MLE is “sum” versus “max” Sampling – Example of approximate inference Simulate from model Likelihood weighting for inference Can be very slowWhere are we? Bayesian networks Represent exponentially-large probability distributions compactly Inference in BNs Exact inference very fast for problems with low tree-width Many approximate inference algorithms, e.g., sampling Learning BNs Given structure, estimate parameters Using counts (maximum likelihood), MAP is also possible What about learning structure?Maximum likelihood (ML) for learning BN structureData<x_1^{(1)},…,x_n^{(1)}>…<x_1^{(m)},…,x_n^{(m)}>FluAllergySinusHeadacheNosePossible structuresScore structureLearn parametersusing MLHow many graphs are there?How many trees are there?Nonetheless – Efficient optimal algorithm finds best treeScoring a treeChow-Liu tree learning algorithm For each pair of variables Xi,XjCompute empirical distribution: Compute mutual information: Define a graph Nodes X1,…,Xn Edge (i,j) gets weight Optimal tree BN Compute maximal spanning tree Directions in BN: pick any node as root, breadth-first-search defines directionsCan we extend Chow-Liu Tree augmented naïve Bayes (TAN) [Friedman et al. ’97] Same as Chow-Liu, but score edges with: (Approximate learning) models with tree-width up to k [Narasimhan & Bilmes ’04] But, O(nk+1)…Scoring general graphical models –Model selection problemWhat’s the best
View Full Document