DOC PREVIEW
CMU CS 10701 - EM for Bayes Nets

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

EM for Bayes Nets Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University April 16th 2007 2005 2007 Carlos Guestrin 1 Learning HMMs from fully observable data is easy X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Learn 3 distributions What if O is observed but X is hidden 2 2005 2007 Carlos Guestrin Log likelihood for HMMs when X is hidden Marginal likelihood O is observed X is missing For simplicity of notation training data consists of only one sequence If there were m sequences 3 2005 2007 Carlos Guestrin Computing Log likelihood for HMMs when X is hidden X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 4 2005 2007 Carlos Guestrin The M step X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Maximization step Use expected counts instead of counts If learning requires Count x o Use EQ t 1 Count x o 5 2005 2007 Carlos Guestrin E step revisited X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 E step computes probability of hidden vars x given o Must compute Q xt a o marginal probability of each position Just forwards backwards Q xt 1 a xt b o joint distribution between pairs of positions 6 2005 2007 Carlos Guestrin Exploiting unlabeled data in clustering A few data points are labeled x o Most points are unlabeled o In the E step of EM If i th point is unlabeled If i th point is labeled compute Q X oi as usual set Q X x oi 1 and Q X x oi 0 M step as usual 7 2005 2007 Carlos Guestrin 20 Newsgroups data advantage of adding unlabeled data 8 2005 2007 Carlos Guestrin Data likelihood for BNs Flu Allergy Sinus Given structure log likelihood of fully observed data Nose Headache 9 2005 2007 Carlos Guestrin Marginal likelihood Flu Allergy Sinus What if S is hidden Nose Headache 10 2005 2007 Carlos Guestrin Log likelihood for BNs with hidden data Flu Marginal likelihood O is observed H is hidden Sinus Nose Headache 11 2005 2007 Carlos Guestrin Allergy E step for BNs Flu Allergy Sinus Nose Headache E step computes probability of hidden vars h given o Corresponds to inference in BN 12 2005 2007 Carlos Guestrin Flu The M step for BNs Allergy Sinus Nose Headache Maximization step Use expected counts instead of counts If learning requires Count h o Use EQ t 1 Count h o 13 2005 2007 Carlos Guestrin Flu M step for each CPT Allergy Sinus Nose Headache M step decomposes per CPT Standard M step MLE uses expected counts 14 2005 2007 Carlos Guestrin Flu Computing expected counts Allergy Sinus Nose Headache M step requires expected counts For a set of vars A must compute ExCount A a Some of A in example j will be observed denote by AO aO j Some of A will be hidden denote by AH Use inference E step computes expected counts ExCount t 1 AO aO j AH aH P AH aH AO aO j t 15 2005 2007 Carlos Guestrin Data need not be hidden in the same way Nose Headache A data point is A data point is But unobserved variables can be different for different data points Sinus When data is partially observed Allergy When data is fully observed Flu e g Same framework just change definition of expected counts ExCount t 1 AO aO j AH aH P AH aH AO aO j t 16 2005 2007 Carlos Guestrin What you need to know EM for Bayes Nets E step inference computes expected counts Only need expected counts over Xi and Paxi M step expected counts used to estimate parameters Hidden variables can change per datapoint Use labeled and unlabeled data some data points are complete some include hidden variables 17 2005 2007 Carlos Guestrin Co Training for Semisupervised learning Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University April 16th 2007 2005 2007 Carlos Guestrin 18 Redundant information 19 2005 2007 Carlos Guestrin Redundant information webpage text 20 2005 2007 Carlos Guestrin Redundant information anchor text for hyperlinks 21 2005 2007 Carlos Guestrin Exploiting redundant information in semi supervised learning Want to predict Y from features X f X aY have some labeled data L lots of unlabeled data U Co training assumption X is very expressive X X1 X2 can learn g1 X1 a Y g2 X2 a Y 22 2005 2007 Carlos Guestrin Co Training Key idea Classifier1 and Classifier2 must Correctly classify labeled data Agree on unlabeled data 23 2005 2007 Carlos Guestrin Co Training Algorithm Blum Mitchell 99 24 2005 2007 Carlos Guestrin Co Training experimental results 25 2005 2007 Carlos Guestrin Co Training theory Want to predict Y from features X Co training assumption X is very expressive f X a Y X X1 X2 want to learn g1 X1 a Y and g2 X2 a Y Assumption g1 g2 x g1 x1 f x g2 x2 f x Questions Does unlabeled data always help How many labeled examples do I need How many unlabeled examples do I need 26 2005 2007 Carlos Guestrin Understanding Co Training A simple setting Suppose X1 and X2 are discrete X1 X2 N No label noise Without unlabeled data how hard is it to learn g1 or g2 27 2005 2007 Carlos Guestrin Co Training in simple setting Iteration 0 28 2005 2007 Carlos Guestrin Co Training in simple setting Iteration 1 29 2005 2007 Carlos Guestrin Co Training in simple setting after convergence 30 2005 2007 Carlos Guestrin Co Training in simple setting Connected components Suppose infinite unlabeled data Co training must have at least one labeled example in each connected component of L U graph What s probability of making an error For k Connected components how much labeled data 31 2005 2007 Carlos Guestrin How much unlabeled data 32 2005 2007 Carlos Guestrin Co Training theory Want to predict Y from features X Co training assumption X is very expressive f X a Y X X1 X2 want to learn g1 X1 a Y and g2 X2 a Y Assumption g1 g2 x g1 x1 f x g2 x2 f x One co training result Blum Mitchell 99 If X1 X2 Y g1 g2 are PAC learnable from noisy data and thus f Then f is PAC learnable from weak initial classifier plus unlabeled data 33 2005 2007 Carlos Guestrin What you need to know about cotraining Unlabeled data can help supervised learning a lot when there are mostly independent redundant features One theoretical result If X1 X2 Y and g1 g2 are PAC learnable from noisy data and thus f Then f is PAC learnable from weak initial classifier plus unlabeled data Disagreement between g1 and g2 provides bound on error of final classifier Applied in many real world settings Semantic lexicon generation Riloff Jones 99 Collins Singer 99 Jones 05 Web page classification …


View Full Document

CMU CS 10701 - EM for Bayes Nets

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download EM for Bayes Nets
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view EM for Bayes Nets and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view EM for Bayes Nets and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?