DOC PREVIEW
CMU CS 10701 - EM for Bayes Nets

This preview shows page 1-2-17-18-19-35-36 out of 36 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

©2005-2007 Carlos Guestrin1EM for Bayes NetsMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityApril 16th, 20072©2005-2007 Carlos GuestrinLearning HMMs from fully observable data is easyX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Learn 3 distributions:What if O is observed, but X is hidden3©2005-2007 Carlos GuestrinLog likelihood for HMMs when Xis hidden  Marginal likelihood – O is observed, X is missing For simplicity of notation, training data consists of only one sequence: If there were m sequences:4©2005-2007 Carlos GuestrinComputing Log likelihood for HMMs when X is hiddenX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=5©2005-2007 Carlos GuestrinThe M-step Maximization step: Use expected counts instead of counts: If learning requires Count(x,o) Use EQ(t+1)[Count(x,o)]X1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=6©2005-2007 Carlos GuestrinE-step revisited  E-step computes probability of hidden vars xgiven o Must compute: Q(xt=a|o) – marginal probability of each position Just forwards-backwards! Q(xt+1=a,xt=b|o) – joint distribution between pairs of positionsX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=7©2005-2007 Carlos GuestrinExploiting unlabeled data in clustering A few data points are labeled <x,o> Most points are unlabeled <?,o> In the E-step of EM: If i’th point is unlabeled: compute Q(X|oi) as usual If i’th point is labeled: set Q(X=x|oi)=1 and Q(X≠x|oi)=0 M-step as usual8©2005-2007 Carlos Guestrin20 Newsgroups data – advantage of adding unlabeled data9©2005-2007 Carlos GuestrinData likelihood for BNs Given structure, log likelihood of fully observed data:FluAllergySinusHeadacheNose10©2005-2007 Carlos GuestrinMarginal likelihood What if S is hidden?FluAllergySinusHeadacheNose11©2005-2007 Carlos GuestrinLog likelihood for BNs with hidden data Marginal likelihood – O is observed, H is hiddenFluAllergySinusHeadacheNose12©2005-2007 Carlos GuestrinE-step for BNs E-step computes probability of hidden vars h given o Corresponds to inference in BNFluAllergySinusHeadacheNose13©2005-2007 Carlos GuestrinThe M-step for BNs Maximization step: Use expected counts instead of counts: If learning requires Count(h,o) Use EQ(t+1)[Count(h,o)]FluAllergySinusHeadacheNose14©2005-2007 Carlos GuestrinM-step for each CPT M-step decomposes per CPT Standard MLE: M-step uses expected counts:FluAllergySinusHeadacheNose15©2005-2007 Carlos GuestrinComputing expected counts M-step requires expected counts: For a set of vars A, must compute ExCount(A=a) Some of A in example j will be observed denote by AO= aO(j) Some of A will be hidden denote by AH Use inference (E-step computes expected counts): ExCount(t+1)(AO= aO(j), AH= aH) ← P(AH= aH, AO= aO(j)|θ(t))FluAllergySinusHeadacheNose16©2005-2007 Carlos GuestrinData need not be hidden in the same way When data is fully observed A data point is  When data is partially observed A data point is  But unobserved variables can be different for different data points e.g., Same framework, just change definition of expected counts ExCount(t+1)(AO= aO(j), AH= aH) ← P(AH= aH, AO= aO(j)|θ(t))FluAllergySinusHeadacheNose17©2005-2007 Carlos GuestrinWhat you need to know EM for Bayes Nets E-step: inference computes expected counts Only need expected counts over Xiand Paxi M-step: expected counts used to estimate parameters Hidden variables can change per datapoint Use labeled and unlabeled data → some data points are complete, some include hidden variables18©2005-2007 Carlos GuestrinAnnouncements No recitation this week On Wednesday, Special lecture on learning with text data by Prof. Noah Smith (LTI)©2005-2007 Carlos Guestrin19Co-Training for Semi-supervised learningMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityApril 16th, 200720©2005-2007 Carlos GuestrinRedundant information21©2005-2007 Carlos GuestrinRedundant information – webpage text22©2005-2007 Carlos GuestrinRedundant information – anchor text for hyperlinks23©2005-2007 Carlos GuestrinExploiting redundant information in semi-supervised learning Want to predict Y from features X f(X)  Y have some labeled data L lots of unlabeled data U Co-training assumption: X is very expressive  X = (X1,X2) can learn  g1(X1)  Y  g2(X2)  Y24©2005-2007 Carlos GuestrinCo-Training Key idea: Classifier1and Classifier2must: Correctly classify labeled data Agree on unlabeled data25©2005-2007 Carlos GuestrinCo-Training Algorithm [Blum & Mitchell ’99]26©2005-2007 Carlos GuestrinCo-Training experimental results27©2005-2007 Carlos GuestrinCo-Training theory Want to predict Y from features X f(X)  Y Co-training assumption: X is very expressive  X = (X1,X2) want to learn g1(X1)  Y and g2(X2)  Y  Assumption: ∃ g1, g2, ∀ x g1(x1) = f(x), g2(x2) = f(x)  Questions: Does unlabeled data always help? How many labeled examples do I need?  How many unlabeled examples do I need?28©2005-2007 Carlos GuestrinUnderstanding Co-Training: A simple setting Suppose X1and X2are discrete |X1| = |X2| = N No label noise Without unlabeled data, how hard is it to learn g1(or g2)?29©2005-2007 Carlos GuestrinCo-Training in simple setting –Iteration 030©2005-2007 Carlos GuestrinCo-Training in simple setting –Iteration 131©2005-2007 Carlos GuestrinCo-Training in simple setting – after convergence32©2005-2007 Carlos GuestrinCo-Training in simple setting –Connected components Suppose infinite unlabeled data Co-training must have at least one labeled example in each connected component of L+U graph What’s probability of making an error? For k Connected components, how much labeled data?33©2005-2007 Carlos GuestrinHow much unlabeled data?34©2005-2007 Carlos GuestrinCo-Training theory Want to predict Y from features X f(X)  Y Co-training assumption: X is very expressive  X = (X1,X2) want to learn g1(X1)  Y and g2(X2)  Y  Assumption: ∃ g1, g2, ∀ x g1(x1) = f(x), g2(x2) =


View Full Document

CMU CS 10701 - EM for Bayes Nets

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download EM for Bayes Nets
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view EM for Bayes Nets and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view EM for Bayes Nets 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?