DOC PREVIEW
CMU CS 10708 - Bayesian Param. Learning Bayesian Structure Learning

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1 1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University October 6th, 2008 Readings: K&F: 16.3, 16.4, 17.3 10-708 – ©Carlos Guestrin 2006-2008 10-708 – ©Carlos Guestrin 2006-2008 2 Decomposable score  Log data likelihood  Decomposable score:  Decomposes over families in BN (node and its parents)  Will lead to significant computational efficiency!!!  Score(G : D) = ∑i FamScore(Xi|PaXi : D)2 10-708 – ©Carlos Guestrin 2006-2008 3 Chow-Liu tree learning algorithm 1  For each pair of variables Xi,Xj  Compute empirical distribution:  Compute mutual information:  Define a graph  Nodes X1,…,Xn  Edge (i,j) gets weight 10-708 – ©Carlos Guestrin 2006-2008 4 Chow-Liu tree learning algorithm 2  Optimal tree BN  Compute maximum weight spanning tree  Directions in BN: pick any node as root, breadth-first-search defines directions3 10-708 – ©Carlos Guestrin 2006-2008 5 Can we extend Chow-Liu 1  Tree augmented naïve Bayes (TAN) [Friedman et al. ’97]  Naïve Bayes model overcounts, because correlation between features not considered  Same as Chow-Liu, but score edges with: 10-708 – ©Carlos Guestrin 2006-2008 6 Can we extend Chow-Liu 2  (Approximately learning) models with tree-width up to k  [Chechetka & Guestrin ’07]  But, O(n2k+6)4 10-708 – ©Carlos Guestrin 2006-2008 7 What you need to know about learning BN structures so far  Decomposable scores  Maximum likelihood  Information theoretic interpretation  Best tree (Chow-Liu)  Best TAN  Nearly best k-treewidth (in O(N2k+6)) 10-708 – ©Carlos Guestrin 2006-2008 8 Maximum likelihood score overfits!  Information never hurts:  Adding a parent always increases score!!!5 10-708 – ©Carlos Guestrin 2006-2008 9 Bayesian score  Prior distributions:  Over structures  Over parameters of a structure  Posterior over structures given data: 10-708 – ©Carlos Guestrin 2006-2008 10 m Can we really trust MLE?  What is better?  3 heads, 2 tails  30 heads, 20 tails  3x1023 heads, 2x1023 tails  Many possible answers, we need distributions over possible parameters6 10-708 – ©Carlos Guestrin 2006-2008 11 Bayesian Learning  Use Bayes rule:  Or equivalently: 10-708 – ©Carlos Guestrin 2006-2008 12 Bayesian Learning for Thumbtack  Likelihood function is simply Binomial:  What about prior?  Represent expert knowledge  Simple posterior form  Conjugate priors:  Closed-form representation of posterior (more details soon)  For Binomial, conjugate prior is Beta distribution7 10-708 – ©Carlos Guestrin 2006-2008 13 Beta prior distribution – P(θ)  Likelihood function:  Posterior: 10-708 – ©Carlos Guestrin 2006-2008 14 Posterior distribution  Prior:  Data: mH heads and mT tails  Posterior distribution:8 10-708 – ©Carlos Guestrin 2006-2008 15 Conjugate prior  Given likelihood function P(D|θ)  (Parametric) prior of the form P(θ|α) is conjugate to likelihood function if posterior is of the same parametric family, and can be written as:  P(θ|α’), for some new set of parameters α’  Prior:  Data: mH heads and mT tails (binomial likelihood)  Posterior distribution: 10-708 – ©Carlos Guestrin 2006-2008 16 Using Bayesian posterior  Posterior distribution:  Bayesian inference:  No longer single parameter:  Integral is often hard to compute9 10-708 – ©Carlos Guestrin 2006-2008 17 Bayesian prediction of a new coin flip  Prior:  Observed mH heads, mT tails, what is probability of m+1 flip is heads? 10-708 – ©Carlos Guestrin 2006-2008 18 Asymptotic behavior and equivalent sample size  Beta prior equivalent to extra thumbtack flips:   As m → 1, prior is “forgotten”  But, for small sample size, prior is important!  Equivalent sample size:  Prior parameterized by αH,αT, or  m’ (equivalent sample size) and α% Fix m’, change α%Fix α, change m’10 10-708 – ©Carlos Guestrin 2006-2008 19 Bayesian learning corresponds to smoothing  m=0 ) prior parameter  m!1 ) MLE m 10-708 – ©Carlos Guestrin 2006-2008 20 Bayesian learning for multinomial  What if you have a k sided coin???  Likelihood function if multinomial:    Conjugate prior for multinomial is Dirichlet:   Observe m data points, mi from assignment i, posterior:  Prediction:11 10-708 – ©Carlos Guestrin 2006-2008 21 Bayesian learning for two-node BN  Parameters θX, θY|X  Priors:  P(θX):  P(θY|X): 10-708 – ©Carlos Guestrin 2006-2008 22 Very important assumption on prior: Global parameter independence  Global parameter independence:  Prior over parameters is product of prior over CPTs12 10-708 – ©Carlos Guestrin 2006-2008 23 Global parameter independence, d-separation and local prediction Flu Allergy Sinus Headache Nose  Independencies in meta BN:  Proposition: For fully observable data D, if prior satisfies global parameter independence, then 10-708 – ©Carlos Guestrin 2006-2008 24 Within a CPT  Meta BN including CPT parameters:  Are θY|X=t and θY|X=f d-separated given D?  Are θY|X=t and θY|X=f independent given D?  Context-specific independence!!!  Posterior decomposes:13 10-708 – ©Carlos Guestrin 2006-2008 25 Priors for BN CPTs (more when we talk about structure learning)  Consider each CPT: P(X|U=u)  Conjugate prior:  Dirichlet(αX=1|U=u,…, αX=k|U=u)  More intuitive:  “prior data set” D’ with m’ equivalent sample size  “prior counts”:  prediction: 10-708 – ©Carlos Guestrin 2006-2008 26 An example14 10-708 – ©Carlos Guestrin 2006-2008 27 What you need to know about parameter learning  Bayesian parameter learning:  motivation for Bayesian approach  Bayesian prediction  conjugate priors, equivalent sample size  Bayesian learning ) smoothing  Bayesian learning for BN parameters  Global parameter independence  Decomposition of prediction according to CPTs  Decomposition within a CPT


View Full Document

CMU CS 10708 - Bayesian Param. Learning Bayesian Structure Learning

Documents in this Course
Lecture

Lecture

15 pages

Lecture

Lecture

25 pages

Lecture

Lecture

24 pages

causality

causality

53 pages

lecture11

lecture11

16 pages

Exam

Exam

15 pages

Notes

Notes

12 pages

lecture

lecture

18 pages

lecture

lecture

16 pages

Lecture

Lecture

17 pages

Lecture

Lecture

15 pages

Lecture

Lecture

17 pages

Lecture

Lecture

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

16 pages

r6

r6

22 pages

lecture

lecture

20 pages

lecture

lecture

35 pages

Lecture

Lecture

19 pages

Lecture

Lecture

21 pages

lecture

lecture

21 pages

lecture

lecture

13 pages

review

review

50 pages

Semantics

Semantics

30 pages

lecture21

lecture21

26 pages

MN-crf

MN-crf

20 pages

hw4

hw4

5 pages

lecture

lecture

12 pages

Lecture

Lecture

25 pages

Lecture

Lecture

25 pages

Lecture

Lecture

14 pages

Lecture

Lecture

15 pages

Load more
Download Bayesian Param. Learning Bayesian Structure Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Param. Learning Bayesian Structure Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Param. Learning Bayesian Structure Learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?