DOC PREVIEW
CMU CS 10701 - Bayesian Networks (Structure) Learning

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Bayesian Networks Structure Learning Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University 2005 2009 Carlos Guestrin November 16th 2009 1 Review Bayesian Networks Flu Compute P X e Time exponential in tree width not number of variables Allergy Sinus Fast probabilistic inference using variable elimination Compact representation for probability distributions Exponential reduction in number of parameters Headache Nose Today Learn BN structure 2005 2009 Carlos Guestrin 2 1 Learning Bayes nets Data CPTs P Xi PaXi x 1 x m structure parameters 3 2005 2009 Carlos Guestrin Learning the CPTs Data For each discrete variable Xi x 1 x m 2005 2009 Carlos Guestrin 4 2 Information theoretic interpretation of maximum likelihood 1 Flu Allergy Sinus Given structure log likelihood of data Nose Headache 5 2005 2009 Carlos Guestrin Information theoretic interpretation of maximum likelihood 2 Flu Allergy Sinus Given structure log likelihood of data 2005 2009 Carlos Guestrin Nose Headache 6 3 Information theoretic interpretation of maximum likelihood 3 Flu Allergy Sinus Given structure log likelihood of data 2005 2009 Carlos Guestrin Nose Headache 7 Decomposable score Log data likelihood Decomposable score Decomposes over families in BN node and its parents Will lead to significant computational efficiency Score G D i FamScore Xi PaXi D 2005 2009 Carlos Guestrin 8 4 How many trees are there Nonetheless Efficient optimal algorithm finds best tree 2005 2009 Carlos Guestrin 9 Scoring a tree 1 equivalent trees 2005 2009 Carlos Guestrin 10 5 Scoring a tree 2 similar trees 2005 2009 Carlos Guestrin 11 Chow Liu tree learning algorithm 1 For each pair of variables Xi Xj Compute empirical distribution Compute mutual information Define a graph Nodes X1 Xn Edge i j gets weight 2005 2009 Carlos Guestrin 12 6 Chow Liu tree learning algorithm 2 Optimal tree BN Compute maximum weight spanning tree Directions in BN pick any node as root breadth firstsearch defines directions 2005 2009 Carlos Guestrin 13 Can we extend Chow Liu 1 Tree augmented na ve Bayes TAN Friedman et al 97 Na ve Bayes model overcounts because correlation between features not considered Your homework 2005 2009 Carlos Guestrin 14 7 Can we extend Chow Liu 2 Approximately learning models with tree width up to k Chechetka Guestrin 07 But O n2k 6 2005 2009 Carlos Guestrin 15 What you need to know about learning BN structures so far Decomposable scores Maximum likelihood Information theoretic interpretation Best tree Chow Liu Best TAN Nearly best k treewidth in O N2k 6 2005 2009 Carlos Guestrin 16 8 Scoring general graphical models Model selection problem What s the best structure Flu Allergy Sinus Headache Nose Data x1 1 xn 1 x1 m xn m The more edges the fewer independence assumptions the higher the likelihood of the data but will overfit 2005 2009 Carlos Guestrin 17 Maximum likelihood overfits Information never hurts Adding a parent always increases score 2005 2009 Carlos Guestrin 18 9 Bayesian score avoids overfitting Given a structure distribution over parameters Difficult integral use Bayes information criterion BIC approximation equivalent as M 1 Note regularize with MDL score Best BN under BIC still NP hard 2005 2009 Carlos Guestrin 19 Structure learning for general graphs In a tree a node only has one parent Theorem The problem of learning a BN structure with at most d parents is NP hard for any fixed d 1 Most structure learning approaches use heuristics Exploit score decomposition Quickly Describe the simplest heuristic that exploits decomposition 2005 2009 Carlos Guestrin 20 10 Learn BN structure using local search Starting from Chow Liu tree Local search possible moves Add edge Delete edge Invert edge Score using BIC 2005 2009 Carlos Guestrin 21 What you need to know about learning BNs Learning BNs Maximum likelihood or MAP learns parameters Decomposable score Best tree Chow Liu Best TAN Other BNs usually local search with BIC score 2005 2009 Carlos Guestrin 22 11 Unsupervised Learning Clustering K means Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University 2005 2009 Carlos Guestrin November 16th 2009 2005 2007 Carlos Guestrin 23 23 Some Data 2005 2009 Carlos Guestrin 24 12 K means 1 Ask user how many clusters they d like e g k 5 2005 2009 Carlos Guestrin 25 K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 2005 2009 Carlos Guestrin 26 13 K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 3 Each datapoint finds out which Center it s closest to Thus each Center owns a set of datapoints 2005 2009 Carlos Guestrin 27 K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 3 Each datapoint finds out which Center it s closest to 4 Each Center finds the centroid of the points it owns 2005 2009 Carlos Guestrin 28 14 K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 3 Each datapoint finds out which Center it s closest to 4 Each Center finds the centroid of the points it owns 5 and jumps there 6 Repeat until terminated 2005 2009 Carlos Guestrin 29 K means Randomly initialize k centers 0 1 0 k 0 Classify Assign each point j 1 m to nearest center Recenter i becomes centroid of its point Equivalent to 2005 2009 Carlos Guestrin i average of its points 30 15 What is K means optimizing Potential function F C of centers and point allocations C Optimal K means min minC F C 2005 2009 Carlos Guestrin 31 Does K means converge Part 1 Optimize potential function Fix optimize C 2005 2009 Carlos Guestrin 32 16 Does K means converge Part 2 Optimize potential function Fix C optimize 2005 2009 Carlos Guestrin 33 Coordinate descent algorithms Want mina minb F a b Coordinate descent fix a minimize b fix b minimize a repeat Converges if F is bounded to a often good local optimum as we saw in applet play with it K means is a coordinate descent algorithm 2005 2009 Carlos Guestrin 34 17


View Full Document

CMU CS 10701 - Bayesian Networks (Structure) Learning

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Bayesian Networks (Structure) Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Networks (Structure) Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Networks (Structure) Learning and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?