DOC PREVIEW
CMU CS 10701 - Bayesian Networks –(Structure) Learning

This preview shows page 1-2-3-4-5-6-39-40-41-42-43-80-81-82-83-84-85 out of 85 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Bayesian Networks Structure Learning Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University April 2nd 2007 2005 2007 Carlos Guestrin Review Bayesian Networks Flu Compact representation for probability distributions Exponential reduction in number of parameters Fast probabilistic inference using variable elimination Sinus Headache Compute P X e Time exponential in tree width not number of variables Today Allergy Learn BN structure 2005 2007 Carlos Guestrin Nose Learning Bayes nets Known structure Unknown structure Fully observable data Missing data Data x 1 x m CPTs P Xi PaXi structure 2005 2007 Carlos Guestrin parameters Learning the CPTs Data For each discrete variable Xi x 1 x m 2005 2007 Carlos Guestrin Information theoretic interpretation of maximum likelihood Flu Allergy Sinus Given structure log likelihood of data 2005 2007 Carlos Guestrin Headache Nose Information theoretic interpretation of maximum likelihood Flu Allergy Sinus Given structure log likelihood of data 2005 2007 Carlos Guestrin Headache Nose Information theoretic interpretation of maximum likelihood 2 Flu Allergy Sinus Given structure log likelihood of data 2005 2007 Carlos Guestrin Headache Nose Decomposable score Log data likelihood Decomposable score Decomposes over families in BN node and its parents Will lead to significant computational efficiency Score G D i FamScore Xi PaXi D 2005 2007 Carlos Guestrin How many trees are there Nonetheless Efficient optimal algorithm finds best tree 2005 2007 Carlos Guestrin Scoring a tree 1 equivalent trees 2005 2007 Carlos Guestrin Scoring a tree 2 similar trees 2005 2007 Carlos Guestrin Chow Liu tree learning algorithm 1 For each pair of variables Xi Xj Compute empirical distribution Compute mutual information Define a graph Nodes X1 Xn Edge i j gets weight 2005 2007 Carlos Guestrin Chow Liu tree learning algorithm 2 Optimal tree BN Compute maximum weight spanning tree Directions in BN pick any node as root breadth firstsearch defines directions 2005 2007 Carlos Guestrin Can we extend Chow Liu 1 Tree augmented na ve Bayes TAN Friedman et al 97 Na ve Bayes model overcounts because correlation between features not considered Same as Chow Liu but score edges with 2005 2007 Carlos Guestrin Can we extend Chow Liu 2 Approximately learning models with tree width up to k Narasimhan Bilmes 04 But O nk 1 and more subtleties 2005 2007 Carlos Guestrin What you need to know about learning BN structures so far Decomposable scores Maximum likelihood Information theoretic interpretation Best tree Chow Liu Best TAN Nearly best k treewidth in O Nk 1 2005 2007 Carlos Guestrin Scoring general graphical models Model selection problem What s the best structure Flu Allergy Sinus Headache Nose Data x 1 1 x n 1 x 1 m x n m The more edges the fewer independence assumptions the higher the likelihood of the data but will overfit 2005 2007 Carlos Guestrin Maximum likelihood overfits Information never hurts Adding a parent always increases score 2005 2007 Carlos Guestrin Bayesian score avoids overfitting Given a structure distribution over parameters Difficult integral use Bayes information criterion BIC approximation equivalent as M 1 Note regularize with MDL score Best BN under BIC still NP hard 2005 2007 Carlos Guestrin How many graphs are there 2005 2007 Carlos Guestrin Structure learning for general graphs In a tree a node only has one parent Theorem The problem of learning a BN structure with at most d parents is NP hard for any fixed d 2 Most structure learning approaches use heuristics Exploit score decomposition Quickly Describe two heuristics that exploit decomposition in different ways 2005 2007 Carlos Guestrin Learn BN structure using local search Starting from Chow Liu tree Local search possible moves Add edge Delete edge Invert edge 2005 2007 Carlos Guestrin Score using BIC What you need to know about learning BNs Learning BNs Maximum likelihood or MAP learns parameters Decomposable score Best tree Chow Liu Best TAN Other BNs usually local search with BIC score 2005 2007 Carlos Guestrin Unsupervised learning or Clustering K means Gaussian mixture models Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University April 2nd 2007 2005 2007 Carlos Guestrin Some Data 2005 2007 Carlos Guestrin K means 1 Ask user how many clusters they d like e g k 5 2005 2007 Carlos Guestrin K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 2005 2007 Carlos Guestrin K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 3 Each datapoint finds out which Center it s closest to Thus each Center owns a set of datapoints 2005 2007 Carlos Guestrin K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 3 Each datapoint finds out which Center it s closest to 4 Each Center finds the centroid of the points it owns 2005 2007 Carlos Guestrin K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 3 Each datapoint finds out which Center it s closest to 4 Each Center finds the centroid of the points it owns 5 and jumps there 6 Repeat until terminated 2005 2007 Carlos Guestrin Unsupervised Learning You walk into a bar A stranger approaches and tells you I ve got data from k classes Each class produces observations with a normal distribution and variance 2 I Standard simple multivariate gaussian assumptions I can tell you all the P wi s So far looks straightforward I need a maximum likelihood estimate of the i s No problem There s just one thing None of the data are labeled I have datapoints but I don t know what class they re from any of them Uh oh 2005 2007 Carlos Guestrin Gaussian Bayes Classifier Reminder p x y i P y i P y i x p x 1 1 exp x k m 2 1 2 2 i 2 P y i x p x How do we deal with that 2005 2007 Carlos Guestrin T x i i k i pi Predicting wealth from age 2005 2007 Carlos Guestrin Predicting wealth from age 2005 2007 Carlos Guestrin Learning modelyear mpg maker 2005 2007 Carlos Guestrin 21 12 12 2 2 M M 1m 2 m L 1m L 2m O M L 2 m O m2 General parameters 2005 2007 Carlos Guestrin 21 12 12 2 2 M M 1m 2 m L 1m L 2m O M L 2 m Aligned O m parameters 21 0 0 22 0 0 M M 0 0 0 0 2005 2007 Carlos Guestrin 0 0 23 M 0 0 L 0 0 L 0 0 L 0 0 O M M L 2 m 1 0 L 0 2 m Aligned O m parameters 21 0 0 22 0 0 M M 0 0 0 0 2005 2007 Carlos Guestrin 0 0 23 M 0 0 L 0 0 L 0 0 L 0 0 O M M L 2 m 1 0 L 0 2 m Spherical O 1 cov


View Full Document

CMU CS 10701 - Bayesian Networks –(Structure) Learning

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Bayesian Networks –(Structure) Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Networks –(Structure) Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Networks –(Structure) Learning and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?