DOC PREVIEW
CMU CS 10708 - Parameter Learning 2 Structure Learning 1

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Parameter Learning 2Structure Learning 1:The goodYour first learning algorithmLearning the CPTsMaximum likelihood estimation (MLE) of BN parameters – General caseTaking derivatives of MLE of BN parameters – General caseGeneral MLE for a CPTAnnouncementsCan we really trust MLE?Bayesian LearningBayesian Learning for ThumbtackBeta prior distribution – P()Posterior distributionConjugate priorUsing Bayesian posteriorBayesian prediction of a new coin flipAsymptotic behavior and equivalent sample sizeBayesian learning corresponds to smoothingBayesian learning for multinomialBayesian learning for two-node BNVery important assumption on prior:Global parameter independenceGlobal parameter independence, d-separation and local predictionWithin a CPTPriors for BN CPTs (more when we talk about structure learning)An exampleWhat you need to know about parameter learningWhere are we with learning BNs?Learning the structure of a BNRemember: Obtaining a P-map?Independence testsIndependence tests and the constraint based approachScore-based approachInformation-theoretic interpretation of maximum likelihoodInformation-theoretic interpretation of maximum likelihood 2Decomposable scoreHow many trees are there?Scoring a tree 1: I-equivalent treesScoring a tree 2: similar treesChow-Liu tree learning algorithm 1Chow-Liu tree learning algorithm 2Can we extend Chow-Liu 1Can we extend Chow-Liu 2What you need to know about learning BN structures so far1Readings:K&F: 14.1, 14.2, 14.3, 14.4, 15.1, 15.2, 15.3.1, 15.4.1Parameter Learning 2Structure Learning 1:The goodGraphical Models – 10708Carlos GuestrinCarnegie Mellon UniversitySeptember 27th, 200610-708 –©Carlos Guestrin 20062Your first learning algorithm Set derivative to zero:10-708 –©Carlos Guestrin 20063Learning the CPTsx(1)…x(m)DataFor each discrete variable Xi10-708 –©Carlos Guestrin 20064Maximum likelihood estimation (MLE) of BN parameters – General case Data: x(1),…,x(m) Restriction: x(j)[PaXi] → assignment to PaXiin x(j) Given structure, log likelihood of data:10-708 –©Carlos Guestrin 20065Taking derivatives of MLE of BN parameters – General case10-708 –©Carlos Guestrin 20066General MLE for a CPT Take a CPT: P(X|U) Log likelihood term for this CPT Parameter θX=x|U=u:10-708 –©Carlos Guestrin 20067Announcements Late homeworks: 3 late days for the semester  one late day corresponds to 24 hours! (i.e., 3 late days due Saturday by noon) Give late homeworks to Monica Hopes, Wean Hall 4619  If she is not in her office, time stamp (date and time) your homework, sign it, and put it under her doorAfter late days are used up: Half credit within 48 hours Zero credit after 48 hours All homeworks must be handed in, even for zero credit Homework 2 out later today Recitation tomorrow: review perfect maps, parameter learning10-708 –©Carlos Guestrin 20068mCan we really trust MLE? What is better? 3 heads, 2 tails 30 heads, 20 tails 3x1023heads, 2x1023tails Many possible answers, we need distributions over possible parameters10-708 –©Carlos Guestrin 20069Bayesian Learning Use Bayes rule: Or equivalently:10-708 –©Carlos Guestrin 200610Bayesian Learning for Thumbtack Likelihood function is simply Binomial: What about prior? Represent expert knowledge Simple posterior form Conjugate priors: Closed-form representation of posterior (more details soon) For Binomial, conjugate prior is Beta distribution10-708 –©Carlos Guestrin 200611Beta prior distribution – P(θ) Likelihood function: Posterior:10-708 –©Carlos Guestrin 200612Posterior distribution Prior: Data: mHheads and mTtails Posterior distribution:10-708 –©Carlos Guestrin 200613Conjugate prior Given likelihood function P(D|θ) (Parametric) prior of the form P(θ|α) is conjugate to likelihood function if posterior is of the same parametric family, and can be written as:  P(θ|α’), for some new set of parameters α’ Prior: Data: mHheads and mTtails (binomial likelihood) Posterior distribution:10-708 –©Carlos Guestrin 200614Using Bayesian posterior Posterior distribution:  Bayesian inference: No longer single parameter: Integral is often hard to compute10-708 –©Carlos Guestrin 200615Bayesian prediction of a new coin flip Prior:  Observed mHheads, mTtails, what is probability of m+1 flip is heads?10-708 –©Carlos Guestrin 200616Asymptotic behavior and equivalent sample size Beta prior equivalent to extra thumbtack flips: As m → ∞, prior is “forgotten” But, for small sample size, prior is important! Equivalent sample size: Prior parameterized by αH,αT, or m’ (equivalent sample size) and αFix m’, change αFix α, change m’10-708 –©Carlos Guestrin 200617Bayesian learning corresponds to smoothing m=0 ⇒ prior parameter m→∞ ⇒ MLE m10-708 –©Carlos Guestrin 200618Bayesian learning for multinomial What if you have a k sided coin??? Likelihood function if multinomial: Conjugate prior for multinomial is Dirichlet: Observe m data points, mifrom assignment i, posterior: Prediction:10-708 –©Carlos Guestrin 200619Bayesian learning for two-node BN Parameters θX, θY|X Priors: P(θX): P(θY|X):10-708 –©Carlos Guestrin 200620Very important assumption on prior:Global parameter independence Global parameter independence: Prior over parameters is product of prior over CPTs10-708 –©Carlos Guestrin 200621Global parameter independence, d-separation and local predictionFluAllergySinusHeadacheNose Independencies in meta BN: Proposition: For fully observable data D, if prior satisfies global parameter independence, then10-708 –©Carlos Guestrin 200622Within a CPT Meta BN including CPT parameters: Are θY|X=tand θY|X=fd-separated given D? Are θY|X=tand θY|X=findependent given D? Context-specific independence!!! Posterior decomposes:10-708 –©Carlos Guestrin 200623Priors for BN CPTs(more when we talk about structure learning) Consider each CPT: P(X|U=u) Conjugate prior: Dirichlet(αX=1|U=u,…, αX=k|U=u) More intuitive: “prior data set” D’ with m’ equivalent sample size “prior counts”: prediction:10-708 –©Carlos Guestrin 200624An example10-708 –©Carlos Guestrin 200625What you need to know about parameter learning MLE: score decomposes according to


View Full Document

CMU CS 10708 - Parameter Learning 2 Structure Learning 1

Documents in this Course
Lecture

Lecture

15 pages

Lecture

Lecture

25 pages

Lecture

Lecture

24 pages

causality

causality

53 pages

lecture11

lecture11

16 pages

Exam

Exam

15 pages

Notes

Notes

12 pages

lecture

lecture

18 pages

lecture

lecture

16 pages

Lecture

Lecture

17 pages

Lecture

Lecture

15 pages

Lecture

Lecture

17 pages

Lecture

Lecture

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

16 pages

r6

r6

22 pages

lecture

lecture

20 pages

lecture

lecture

35 pages

Lecture

Lecture

19 pages

Lecture

Lecture

21 pages

lecture

lecture

21 pages

lecture

lecture

13 pages

review

review

50 pages

Semantics

Semantics

30 pages

lecture21

lecture21

26 pages

MN-crf

MN-crf

20 pages

hw4

hw4

5 pages

lecture

lecture

12 pages

Lecture

Lecture

25 pages

Lecture

Lecture

25 pages

Lecture

Lecture

14 pages

Lecture

Lecture

15 pages

Load more
Download Parameter Learning 2 Structure Learning 1
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Parameter Learning 2 Structure Learning 1 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parameter Learning 2 Structure Learning 1 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?