DOC PREVIEW
CMU CS 10708 - lecture

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

EM for BNsThus far, fully supervised learningThe general learning problem with missing dataE-stepJensen’s inequalityApplying Jensen’s inequalityThe M-step maximizes lower bound on weighted dataThe M-stepConvergence of EMData likelihood for BNsMarginal likelihoodLog likelihood for BNs with hidden dataE-step for BNsThe M-step for BNsM-step for each CPTComputing expected countsData need not be hidden in the same wayPoster printingEM for BNs & identifiability: a superficial discussionLearning structure with missing data [K&F 18.4]What you need to know about learning BNs with missing dataMNs & CRFs with missing dataKalman Filters Gaussian BNsAdventures of our BN heroThe Kalman FilterExample of KF – SLAT Simultaneous Localization and TrackingExample of KF – SLAT Simultaneous Localization and TrackingMultivariate GaussianConditioning a GaussianGaussian is a “Linear Model”Slide 31Conditional Linear Gaussian (CLG) – general caseUnderstanding a linear Gaussian – the 2d caseTracking with a Gaussian 1Tracking with Gaussians 2 – Making observations1EM for BNsGraphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityNovember 24th, 2008Readings: 18.1, 18.2, 18.310-708 – Carlos Guestrin 2006-200810-708 – Carlos Guestrin 2006-20082Thus far, fully supervised learningWe have assumed fully supervised learning:Many real problems have missing data:10-708 – Carlos Guestrin 2006-20083The general learning problem with missing dataMarginal likelihood – x is observed, z is missing:10-708 – Carlos Guestrin 2006-20084E-stepx is observed, z is missingCompute probability of missing data given current choice of Q(z|x(j)) for each x(j) e.g., probability computed during classification stepcorresponds to “classification step” in K-means10-708 – Carlos Guestrin 2006-20085Jensen’s inequality Theorem: log z P(z) f(z) ≥ z P(z) log f(z)10-708 – Carlos Guestrin 2006-20086Applying Jensen’s inequalityUse: log z P(z) f(z) ≥ z P(z) log f(z)10-708 – Carlos Guestrin 2006-20087The M-step maximizes lower bound on weighted dataLower bound from Jensen’s:Corresponds to weighted dataset:<x(1),z=1> with weight Q(t+1)(z=1|x(1))<x(1),z=2> with weight Q(t+1)(z=2|x(1))<x(1),z=3> with weight Q(t+1)(z=3|x(1))<x(2),z=1> with weight Q(t+1)(z=1|x(2))<x(2),z=2> with weight Q(t+1)(z=2|x(2))<x(2),z=3> with weight Q(t+1)(z=3|x(2))…10-708 – Carlos Guestrin 2006-20088The M-stepMaximization step:Use expected counts instead of counts:If learning requires Count(x,z)Use EQ(t+1)[Count(x,z)]10-708 – Carlos Guestrin 2006-20089Convergence of EMDefine potential function F(,Q):EM corresponds to coordinate ascent on FThus, maximizes lower bound on marginal log likelihoodAs seen in Machine Learning class last semester10-708 – Carlos Guestrin 200610Data likelihood for BNsGiven structure, log likelihood of fully observed data:FluAllergySinusHeadacheNose10-708 – Carlos Guestrin 200611Marginal likelihoodWhat if S is hidden?FluAllergySinusHeadacheNose10-708 – Carlos Guestrin 200612Log likelihood for BNs with hidden dataMarginal likelihood – O is observed, H is hiddenFluAllergySinusHeadacheNose10-708 – Carlos Guestrin 200613E-step for BNsE-step computes probability of hidden vars h given oCorresponds to inference in BNFluAllergySinusHeadacheNose10-708 – Carlos Guestrin 200614The M-step for BNsMaximization step:Use expected counts instead of counts:If learning requires Count(h,o)Use EQ(t+1)[Count(h,o)]FluAllergySinusHeadacheNose10-708 – Carlos Guestrin 200615M-step for each CPTM-step decomposes per CPTStandard MLE:M-step uses expected counts:FluAllergySinusHeadacheNose10-708 – Carlos Guestrin 200616Computing expected countsM-step requires expected counts:Observe O=oFor a set of vars A, must compute ExCount(A=a)Some of A in example j will be observeddenote by AO = aO(j)Some of A will be hiddendenote by AHUse inference (E-step computes expected counts):ExCount(t+1)(AO = aO, AH = aH)FluAllergySinusHeadacheNose10-708 –  Carlos Guestrin 200617Data need not be hidden in the same wayWhen data is fully observedA data point is When data is partially observedA data point is But unobserved variables can be different for different data pointse.g.,Same framework, just change definition of expected countsObserved vars in point j, Consider set of vars AExCount(t+1)(A = a) FluAllergySinusHeadacheNosePoster printingPoster session:Friday Dec 1st, 3-6pm in the NSH Atrium.There will be a popular vote for best poster. Invite your friends!please be ready to set up your poster at 2:45pm sharp.We will provide posterboards, easels and pins. The posterboards are 30x40 inchesWe don't have a specific poster format for you to use. You can either bring a big poster or a print a set of regular sized pages and pin them together.Unfortunately, we don't have a budget to pay for printing. If you are an SCS student, SCS has a poster printer you can use which prints on a 36" wide roll of paper. If you are a student outside SCS, you will need to check with your department to see if there are printing facilities for big posters (I don't know what is offered outside SCS), or print a set of regular sized pages.We are looking forward to a great poster session!10-708 – Carlos Guestrin 2006-200818EM for BNs & identifiability: a superficial discussionWhat happens if a leaf is never observed?10-708 – Carlos Guestrin 2006-20081910-708 – Carlos Guestrin 200620Learning structure with missing data[K&F 18.4]Known BN structure: Use expected counts, learning algorithm doesn’t changeUnknown BN structure: Can use expected counts and score model as when we talked about structure learningBut, very slow...e.g., greedy algorithm would need to redo inference for every edge we test…(Much Faster) Structure-EM: Iterate:compute expected countsdo a some structure search (e.g., many greedy steps)repeatTheorem: Converges to local optima of marginal log-likelihood details in the bookFluAllergySinusHeadacheNose10-708 – Carlos Guestrin 200621What you need to know about learning BNs with missing dataEM for Bayes NetsE-step: inference computes expected countsOnly need expected counts over Xi and PaxiM-step:


View Full Document

CMU CS 10708 - lecture

Documents in this Course
Lecture

Lecture

15 pages

Lecture

Lecture

25 pages

Lecture

Lecture

24 pages

causality

causality

53 pages

lecture11

lecture11

16 pages

Exam

Exam

15 pages

Notes

Notes

12 pages

lecture

lecture

18 pages

lecture

lecture

16 pages

Lecture

Lecture

17 pages

Lecture

Lecture

15 pages

Lecture

Lecture

17 pages

Lecture

Lecture

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

16 pages

r6

r6

22 pages

lecture

lecture

20 pages

Lecture

Lecture

19 pages

Lecture

Lecture

21 pages

lecture

lecture

21 pages

lecture

lecture

13 pages

review

review

50 pages

Semantics

Semantics

30 pages

lecture21

lecture21

26 pages

MN-crf

MN-crf

20 pages

hw4

hw4

5 pages

lecture

lecture

12 pages

Lecture

Lecture

25 pages

Lecture

Lecture

25 pages

Lecture

Lecture

14 pages

Lecture

Lecture

15 pages

Load more
Download lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?