©2005-2007 Carlos GuestrinBayesian Networks –(Structure) Learning Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityApril 2nd, 2007©2005-2007 Carlos GuestrinReview Bayesian Networks Compact representation for probability distributions Exponential reduction in number of parameters Fast probabilistic inference using variable elimination Compute P(X|e) Time exponential in tree-width, not number of variables Today Learn BN structureFluAllergySinusHeadacheNose©2005-2007 Carlos GuestrinLearning Bayes netsMissing dataFully observable dataUnknown structureKnown structurex(1)…x(m)Datastructure parametersCPTs –P(Xi| PaXi)©2005-2007 Carlos GuestrinLearning the CPTsx(1)…x(m)DataFor each discrete variable Xi©2005-2007 Carlos GuestrinInformation-theoretic interpretation of maximum likelihood Given structure, log likelihood of data:FluAllergySinusHeadacheNose©2005-2007 Carlos GuestrinInformation-theoretic interpretation of maximum likelihood Given structure, log likelihood of data:FluAllergySinusHeadacheNose©2005-2007 Carlos GuestrinInformation-theoretic interpretation of maximum likelihood 2 Given structure, log likelihood of data:FluAllergySinusHeadacheNose©2005-2007 Carlos GuestrinDecomposable score Log data likelihood Decomposable score: Decomposes over families in BN (node and its parents) Will lead to significant computational efficiency!!! Score(G : D) = ∑iFamScore(Xi|PaXi: D)©2005-2007 Carlos GuestrinHow many trees are there?Nonetheless – Efficient optimal algorithm finds best tree©2005-2007 Carlos GuestrinScoring a tree 1: equivalent trees©2005-2007 Carlos GuestrinScoring a tree 2: similar trees©2005-2007 Carlos GuestrinChow-Liu tree learning algorithm 1 For each pair of variables Xi,XjCompute empirical distribution: Compute mutual information: Define a graph Nodes X1,…,Xn Edge (i,j) gets weight©2005-2007 Carlos GuestrinChow-Liu tree learning algorithm 2 Optimal tree BN Compute maximum weight spanning tree Directions in BN: pick any node as root, breadth-first-search defines directions©2005-2007 Carlos GuestrinCan we extend Chow-Liu 1 Tree augmented naïve Bayes (TAN) [Friedman et al. ’97] Naïve Bayes model overcounts, because correlation between features not considered Same as Chow-Liu, but score edges with:©2005-2007 Carlos GuestrinCan we extend Chow-Liu 2 (Approximately learning) models with tree-width up to k [Narasimhan & Bilmes ’04] But, O(nk+1)… and more subtleties©2005-2007 Carlos GuestrinWhat you need to know about learning BN structures so far Decomposable scores Maximum likelihood Information theoretic interpretation Best tree (Chow-Liu) Best TAN Nearly best k-treewidth (in O(Nk+1))©2005-2007 Carlos GuestrinScoring general graphical models –Model selection problemData<x_1^{(1)},…,x_n^{(1)}>…<x_1^{(m)},…,x_n^{(m)}>FluAllergySinusHeadacheNoseWhat’s the best structure?The more edges, the fewer independence assumptions,the higher the likelihood of the data, but will overfit…©2005-2007 Carlos GuestrinMaximum likelihood overfits! Information never hurts: Adding a parent always increases score!!!©2005-2007 Carlos GuestrinBayesian score avoids overfitting Given a structure, distribution over parameters Difficult integral: use Bayes information criterion (BIC) approximation (equivalent as M→∞) Note: regularize with MDL score Best BN under BIC still NP-hard©2005-2007 Carlos GuestrinHow many graphs are there?©2005-2007 Carlos GuestrinStructure learning for general graphs In a tree, a node only has one parent Theorem: The problem of learning a BN structure with at most dparents is NP-hard for any (fixed) d≥2 Most structure learning approaches use heuristics Exploit score decomposition (Quickly) Describe two heuristics that exploit decomposition in different ways©2005-2007 Carlos GuestrinLearn BN structure using local searchStarting from Chow-Liu treeLocal search,possible moves:• Add edge• Delete edge• Invert edgeScore using BIC©2005-2007 Carlos GuestrinWhat you need to know about learning BNs Learning BNs Maximum likelihood or MAP learns parameters Decomposable score Best tree (Chow-Liu) Best TAN Other BNs, usually local search with BIC score©2005-2007 Carlos GuestrinUnsupervised learning or Clustering –K-meansGaussian mixture modelsMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityApril 2nd, 2007©2005-2007 Carlos GuestrinSome Data©2005-2007 Carlos GuestrinK-means1. Ask user how many clusters they’d like. (e.g. k=5)©2005-2007 Carlos GuestrinK-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations©2005-2007 Carlos GuestrinK-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations3. Each datapoint finds out which Center it’s closest to. (Thus each Center “owns”a set of datapoints)©2005-2007 Carlos GuestrinK-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations3. Each datapoint finds out which Center it’s closest to.4. Each Center finds the centroid of the points it owns©2005-2007 Carlos GuestrinK-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations3. Each datapoint finds out which Center it’s closest to.4. Each Center finds the centroid of the points it owns…5. …and jumps there6. …Repeat until terminated!©2005-2007 Carlos GuestrinUnsupervised Learning You walk into a bar.A stranger approaches and tells you:“I’ve got data from k classes. Each class produces observations with a normal distribution and variance σ2·I . Standard simple multivariate gaussian assumptions. I can tell you all the P(wi)’s .” So far, looks straightforward.“I need a maximum likelihood estimate of the µi’s .“ No problem:“There’s just one thing. None of the data are labeled. I have datapoints, but I don’t know what class they’re from (any of them!) Uh oh!!©2005-2007 Carlos GuestrinGaussian Bayes Classifier Reminder)()()|()|(xxxpiyPiypiyP====()())(21exp||||)2(1)|(2/12/xµxΣµxΣxppiyPiikiTikim⎥⎦⎤⎢⎣⎡−−−==πHow do we deal with that?©2005-2007 Carlos GuestrinPredicting wealth from age©2005-2007 Carlos GuestrinPredicting wealth from age©2005-2007 Carlos
View Full Document