1Bayesian Networks(Structure) Learning Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityNovember 16th, 2009©2005-2009 Carlos Guestrin1Review Bayesian Networks Compact representation for probability distributions Exponential reduction in number of parameters Fast probabilistic inference using variable elimination Compute P(X|e) Time exponential in tree-width, not number of variables Today Learn BN structureFluAllergySinusHeadacheNose©2005-2009 Carlos Guestrin22Learning Bayes netsx(1)…x(m)Datastructure parametersCPTs –P(Xi| PaXi)©2005-2009 Carlos Guestrin3Learning the CPTsx(1)…x(m)DataFor each discrete variable Xi©2005-2009 Carlos Guestrin43Information-theoretic interpretation of maximum likelihood 1 Given structure, log likelihood of data:FluAllergySinusHeadacheNose©2005-2009 Carlos Guestrin5Information-theoretic interpretation of maximum likelihood 2 Given structure, log likelihood of data:FluAllergySinusHeadacheNose©2005-2009 Carlos Guestrin64Information-theoretic interpretation of maximum likelihood 3 Given structure, log likelihood of data:FluAllergySinusHeadacheNose©2005-2009 Carlos Guestrin7Decomposable score Log data likelihood Decomposable score: Decomposes over families in BN (node and its parents) Will lead to significant computational efficiency!!! Score(G : D) = ∑iFamScore(Xi|PaXi: D)©2005-2009 Carlos Guestrin85How many trees are there?Nonetheless – Efficient optimal algorithm finds best tree©2005-2009 Carlos Guestrin9Scoring a tree 1: equivalent trees©2005-2009 Carlos Guestrin106Scoring a tree 2: similar trees©2005-2009 Carlos Guestrin11Chow-Liu tree learning algorithm 1 For each pair of variables Xi,Xj Compute empirical distribution: Compute mutual information: Define a graph Nodes X1,…,Xn Edge (i,j) gets weight©2005-2009 Carlos Guestrin127Chow-Liu tree learning algorithm 2 Optimal tree BN Compute maximum weight spanning tree Directions in BN: pick any node as root, breadth-first-search defines directions©2005-2009 Carlos Guestrin13Can we extend Chow-Liu 1 Tree augmented naïve Bayes (TAN) [Friedman et al. ’97] Naïve Bayes model overcounts, because correlation between features not considered Your homework! ☺©2005-2009 Carlos Guestrin148Can we extend Chow-Liu 2 (Approximately learning) models with tree-width up to k [Chechetka & Guestrin ’07] But, O(n2k+6)…©2005-2009 Carlos Guestrin15What you need to know about learning BN structures so far Decomposable scores Maximum likelihood Information theoretic interpretation Best tree (Chow-Liu) Best TAN Nearly best k-treewidth (in O(N2k+6))©2005-2009 Carlos Guestrin169Scoring general graphical models –Model selection problemData<x1(1),…,xn(1)>…<x1(m),…,xn(m)>FluAllergySinusHeadacheNoseWhat’s the best structure?The more edges, the fewer independence assumptions,the higher the likelihood of the data, but will overfit…©2005-2009 Carlos Guestrin17Maximum likelihood overfits! Information never hurts: Adding a parent always increases score!!!©2005-2009 Carlos Guestrin1810Bayesian score avoids overfitting Given a structure, distribution over parameters Difficult integral: use Bayes information criterion (BIC) approximation (equivalent as M! 1) Note: regularize with MDL scoreBest BN under BIC still NP-hard©2005-2009 Carlos Guestrin19Structure learning for general graphs In a tree, a node only has one parent Theorem: The problem of learning a BN structure with at most dparents is NP-hard for any (fixed) d>1 Most structure learning approaches use heuristics Exploit score decomposition (Quickly) Describe the simplest heuristic that exploits decomposition©2005-2009 Carlos Guestrin2011Learn BN structure using local searchStarting from Chow-Liu treeLocal search,possible moves:• Add edge• Delete edge• Invert edgeScore using BIC©2005-2009 Carlos Guestrin21What you need to know about learning BNs Learning BNs Maximum likelihood or MAP learns parameters Decomposable score Best tree (Chow-Liu) Best TAN Other BNs, usually local search with BIC score©2005-2009 Carlos
View Full Document