DOC PREVIEW
CMU CS 10708 - em4bns-gaussians

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11EM for BNsKalman FiltersGaussian BNsGraphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityNovember 17th, 2006Readings:K&F: 16.2Jordan Chapter 20K&F: 4.5, 12.2, 12.310-708 –Carlos Guestrin 20062Thus far, fully supervised learning We have assumed fully supervised learning: Many real problems have missing data:210-708 –Carlos Guestrin 20063The general learning problem with missing data Marginal likelihood – x is observed, z is missing:10-708 –Carlos Guestrin 20064E-step x is observed, z is missing Compute probability of missing data given current choice of θ Q(z|xj) for each xj e.g., probability computed during classification step corresponds to “classification step” in K-means310-708 –Carlos Guestrin 20065Jensen’s inequality  Theorem: log ∑zP(z) f(z) ≥ ∑zP(z) log f(z) 10-708 –Carlos Guestrin 20066Applying Jensen’s inequality Use: log ∑zP(z) f(z) ≥ ∑zP(z) log f(z)410-708 –Carlos Guestrin 20067The M-step maximizes lower bound on weighted data Lower bound from Jensen’s: Corresponds to weighted dataset: <x1,z=1> with weight Q(t+1)(z=1|x1) <x1,z=2> with weight Q(t+1)(z=2|x1) <x1,z=3> with weight Q(t+1)(z=3|x1) <x2,z=1> with weight Q(t+1)(z=1|x2) <x2,z=2> with weight Q(t+1)(z=2|x2) <x2,z=3> with weight Q(t+1)(z=3|x2) …10-708 –Carlos Guestrin 20068The M-step Maximization step: Use expected counts instead of counts: If learning requires Count(x,z) Use EQ(t+1)[Count(x,z)]510-708 –Carlos Guestrin 20069Convergence of EM Define potential function F(θ,Q): EM corresponds to coordinate ascent on F Thus, maximizes lower bound on marginal log likelihood As seen in Machine Learning class last semester10-708 –Carlos Guestrin 200610Announcements Lectures the rest of the semester: Special time: Monday Nov 20 - 5:30-7pm, Wean 4615A: Gaussian GMs & Kalman Filters Special time: Monday Nov 27 - 5:30-7pm, Wean 4615A: Dynamic BNs Wed. 11/30, regular class time: Causality (Richard Scheines) Friday 12/1, regular class time: Finish Dynamic BNs & Overview of Advanced Topics Deadlines & Presentations: Project Poster Presentations: Dec. 1st3-6pm (NSH Atrium) Project write up: Dec. 8thby 2pm by email  8 pages – limit will be strictly enforced Final: Out Dec. 1st, Due Dec. 15thby 2pm (strict deadline)610-708 –Carlos Guestrin 200611Data likelihood for BNs Given structure, log likelihood of fully observed data:FluAllergySinusHeadacheNose10-708 –Carlos Guestrin 200612Marginal likelihood What if S is hidden?FluAllergySinusHeadacheNose710-708 –Carlos Guestrin 200613Log likelihood for BNswith hidden data Marginal likelihood – O is observed, H is hiddenFluAllergySinusHeadacheNose10-708 –Carlos Guestrin 200614E-step for BNs E-step computes probability of hidden vars h given o Corresponds to inference in BNFluAllergySinusHeadacheNose810-708 –Carlos Guestrin 200615The M-step for BNs Maximization step: Use expected counts instead of counts: If learning requires Count(h,o) Use EQ(t+1)[Count(h,o)]FluAllergySinusHeadacheNose10-708 –Carlos Guestrin 200616M-step for each CPT M-step decomposes per CPT Standard MLE: M-step uses expected counts:FluAllergySinusHeadacheNose910-708 –Carlos Guestrin 200617Computing expected counts M-step requires expected counts: For a set of vars A, must compute ExCount(A=a) Some of A in example j will be observed denote by AO= aO(j) Some of A will be hidden denote by AH Use inference (E-step computes expected counts): ExCount(t+1)(AO= aO(j), AH= aH) ← P(AH= aH| AO= aO(j),θθθθ(t))FluAllergySinusHeadacheNose10-708 –Carlos Guestrin 200618Data need not be hidden in the same way When data is fully observed A data point is  When data is partially observed A data point is  But unobserved variables can be different for different data points e.g., Same framework, just change definition of expected counts ExCount(t+1)(AO= aO(j), AH= aH) ← P(AH= aH| AO= aO(j),θθθθ(t))FluAllergySinusHeadacheNose1010-708 –Carlos Guestrin 200619Learning structure with missing data[K&F 16.6] Known BN structure: Use expected counts, learning algorithm doesn’t change Unknown BN structure:  Can use expected counts and score model as when we talked about structure learning But, very slow... e.g., greedy algorithm would need to redo inference for every edge we test… (Much Faster) Structure-EM: Iterate: compute expected counts do a some structure search (e.g., many greedy steps) repeat Theorem: Converges to local optima of marginal log-likelihood  details in the bookFluAllergySinusHeadacheNose10-708 –Carlos Guestrin 200620What you need to know about learning with missing data EM for Bayes Nets E-step: inference computes expected counts Only need expected counts over Xiand Paxi M-step: expected counts used to estimate parameters Which variables are hidden can change per datapoint Also, use labeled and unlabeled data → some data points are complete, some include hidden variables Structure-EM: iterate between computing expected counts & many structure search steps1121Adventures of our BN hero Compact representation for probability distributions Fast inference Fast learning Approximate inference But… Who are the most popular kids?1. Naïve Bayes2 and 3. Hidden Markov models (HMMs)Kalman Filters22The Kalman Filter An HMM with Gaussian distributions Has been around for at least 50 years Possibly the most used graphical model ever It’s what does your cruise control tracks missiles controls robots … And it’s so simple… Possibly explaining why it’s so used Many interesting models build on it… An example of a Gaussian BN (more on this later)1223Example of KF –SLATSimultaneous Localization and Tracking[Funiak, Guestrin, Paskin, Sukthankar ’06] Place some cameras around an environment, don’t know where they are Could measure all locations, but requires lots of grad. student (Stano) time Intuition: A person walks around If camera 1 sees person, then camera 2 sees person, learn about relative positions of cameras24Example of KF –SLAT Simultaneous Localization and Tracking[Funiak, Guestrin, Paskin, Sukthankar ’06]1325Multivariate GaussianMean


View Full Document

CMU CS 10708 - em4bns-gaussians

Documents in this Course
Lecture

Lecture

15 pages

Lecture

Lecture

25 pages

Lecture

Lecture

24 pages

causality

causality

53 pages

lecture11

lecture11

16 pages

Exam

Exam

15 pages

Notes

Notes

12 pages

lecture

lecture

18 pages

lecture

lecture

16 pages

Lecture

Lecture

17 pages

Lecture

Lecture

15 pages

Lecture

Lecture

17 pages

Lecture

Lecture

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

16 pages

r6

r6

22 pages

lecture

lecture

20 pages

lecture

lecture

35 pages

Lecture

Lecture

19 pages

Lecture

Lecture

21 pages

lecture

lecture

21 pages

lecture

lecture

13 pages

review

review

50 pages

Semantics

Semantics

30 pages

lecture21

lecture21

26 pages

MN-crf

MN-crf

20 pages

hw4

hw4

5 pages

lecture

lecture

12 pages

Lecture

Lecture

25 pages

Lecture

Lecture

25 pages

Lecture

Lecture

14 pages

Lecture

Lecture

15 pages

Load more
Download em4bns-gaussians
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view em4bns-gaussians and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view em4bns-gaussians 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?