DOC PREVIEW
CMU CS 10701 - Expectation Maximization

This preview shows page 1-2-3-24-25-26-27-48-49-50 out of 50 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Expectation Maximization Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University April 10th 2006 1 Announcements Reminder Project milestone due Wednesday beginning of class 2 Coordinate descent algorithms Want mina minb F a b Coordinate descent fix a minimize b fix b minimize a repeat Converges if F is bounded to a often good local optimum as we saw in applet play with it K means is a coordinate descent algorithm 3 Expectation Maximalization 4 Back to Unsupervised Learning of GMMs a simple case Remember We have unlabeled data x1 x2 xm We know there are k classes We know P y1 P y2 P y3 P yk We don t know 1 2 k We can write P data 1 k p x1 xm 1 k p x j 1 k m j 1 p x j i P y i m k j 1 i 1 2 1 exp 2 x j i P y i 2 j 1 i 1 m k 5 EM for simple case of GMMs The E step If we know 1 k easily compute prob point xj belongs to class y i 2 1 p y i x j 1 k exp 2 x j i P y i 2 6 EM for simple case of GMMs The M step If we know prob point xj belongs to class y i MLE for i is weighted average imagine k copies of each xj each with weight P y i xj P y i x x m i j 1 m j j P y i x j 1 j 7 E M for GMMs E step Compute expected classes of all datapoints for each class 2 1 p y i x j 1 k exp 2 x j i P y i 2 Just evaluate a Gaussian at xj M step Compute Max like given our data s class membership distributions P y i x x m i j 1 m j j P y i x j 1 j 8 E M for General GMMs pi t is shorthand for estimate of P y i on t th iteration Iterate On the t th iteration let our estimates be t 1 t 2 t k t 1 t 2 t k t p1 t p2 t pk t E step Compute expected classes of all datapoints for each class P y i x j t pi p x j i i t t t Just evaluate a Gaussian at xj M step Compute Max like given our data s class membership distributions i t 1 P y i x x P y i x j t j i j j j t 1 t 1 j t j i P y i x j j m j i t 1 T j j t pi P y i x x x P y i x t 1 t j t m records 9 Gaussian Mixture Example Start 10 After first iteration 11 After 2nd iteration 12 After 3rd iteration 13 After 4th iteration 14 After 5th iteration 15 After 6th iteration 16 After 20th iteration 17 Some Bio Assay data 18 GMM clustering of the assay data 19 Resulting Density Estimator 20 Three classes of assay each learned with it s own mixture model 21 Resulting Bayes Classifier 22 Resulting Bayes Classifier using posterior probabilities to alert about ambiguity and anomalousness Yellow means anomalous Cyan means ambiguous 23 The general learning problem with missing data Marginal likelihood x is observed z is missing 24 E step x is observed z is missing Compute probability of missing data given current choice of Q z xj for each xj e g probability computed during classification step corresponds to classification step in K means 25 Jensen s inequality Theorem log z P z f z z P z log f z 26 Applying Jensen s inequality Use log z P z f z z P z log f z 27 The M step maximizes lower bound on weighted data Lower bound from Jensen s Corresponds to weighted dataset x1 z 1 with weight Q t 1 z 1 x1 x1 z 2 with weight Q t 1 z 2 x1 x1 z 3 with weight Q t 1 z 3 x1 x2 z 1 with weight Q t 1 z 1 x2 x2 z 2 with weight Q t 1 z 2 x2 x2 z 3 with weight Q t 1 z 3 x2 28 The M step Maximization step Use expected counts instead of counts If learning requires Count x z Use EQ t 1 Count x z 29 Convergence of EM Define potential function F Q EM corresponds to coordinate ascent on F Thus maximizes lower bound on marginal log likelihood 30 M step is easy Using potential function 31 E step also doesn t decrease potential function 1 Fixing to t 32 KL divergence Measures distance between distributions KL zero if and only if Q P 33 E step also doesn t decrease potential function 2 Fixing to t 34 E step also doesn t decrease potential function 3 Fixing to t Maximizing F t Q over Q set Q to posterior probability Note that 35 EM is coordinate ascent M step Fix Q maximize F over a lower bound on E step Fix maximize F over Q Realigns F with likelihood 36 What you should know K means for clustering algorithm converges because it s coordinate ascent EM for mixture of Gaussians How to learn maximum likelihood parameters locally max like in the case of unlabeled data Be happy with this kind of probabilistic analysis Remember E M can get stuck in local minima and empirically it DOES EM is coordinate ascent General case for EM 37 Acknowledgements K means Gaussian mixture models presentation contains material from excellent tutorial by Andrew Moore http www autonlab org tutorials K means Applet http www elet polimi it upload matteucc Clustering tu torial html AppletKM html Gaussian mixture models Applet http www neurosci aist go jp 7Eakaho MixtureEM html 38 EM for HMMs a k a The Baum Welch Algorithm Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University April 10th 2006 39 Learning HMMs from fully observable data is easy X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Learn 3 distributions 40 Learning HMMs from fully observable data is easy X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Learn 3 distributions What if O is observed but X is hidden 41 Log likelihood for HMMs with hidden X Marginal likelihood O is observed X is missing For simplicity of notation we ll consider training data consists of only one sequence If there were m sequences 42 E step X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 E step computes probability of hidden vars x given o Will correspond to inference use forward backward algorithm 43 The M step X1 a z X2 a z X3 a …


View Full Document

CMU CS 10701 - Expectation Maximization

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Expectation Maximization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Expectation Maximization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Expectation Maximization and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?