DOC PREVIEW
CMU CS 10701 - EM for HMMs a.k.a. The Baum-Welch Algorithm

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

©2005-2007 Carlos Guestrin1EM for HMMsa.k.a. The Baum-Welch AlgorithmMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityApril 11th, 20072©2005-2007 Carlos GuestrinThe general learning problem with missing data Marginal likelihood – x is observed, z is missing:3©2005-2007 Carlos GuestrinEM is coordinate ascent M-step: Fix Q, maximize F over θ (a lower bound on ): E-step: Fix θ, maximize F over Q: “Realigns” F with likelihood:4©2005-2007 Carlos GuestrinWhat you should know about EM K-means for clustering: algorithm converges because it’s coordinate ascent EM for mixture of Gaussians: How to “learn” maximum likelihood parameters (locally max. like.) in the case of unlabeled data Be happy with this kind of probabilistic analysis Remember, E.M. can get stuck in local minima, and empirically it DOES EM is coordinate ascent General case for EM5©2005-2007 Carlos GuestrinLearning HMMs from fully observable data is easyX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Learn 3 distributions:6©2005-2007 Carlos GuestrinLearning HMMs from fully observable data is easyX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Learn 3 distributions:What if O is observed, but X is hidden7©2005-2007 Carlos GuestrinLog likelihood for HMMs when Xis hidden  Marginal likelihood – O is observed, X is missing For simplicity of notation, training data consists of only one sequence: If there were m sequences:8©2005-2007 Carlos GuestrinComputing Log likelihood for HMMs when X is hiddenX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=9©2005-2007 Carlos GuestrinComputing Log likelihood for HMMswhen X is hidden – variable elimination Can compute efficiently with variable elimination:X1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=10©2005-2007 Carlos GuestrinEM for HMMs when X is hidden E-step: Use inference (forwards-backwards algorithm) M-step: Recompute parameters with weighted dataX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=11©2005-2007 Carlos GuestrinE-step E-step computes probability of hidden vars x given o Will correspond to inference use forward-backward algorithm!X1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=12©2005-2007 Carlos GuestrinThe M-step Maximization step: Use expected counts instead of counts: If learning requires Count(x,o) Use EQ(t+1)[Count(x,o)]X1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=13©2005-2007 Carlos GuestrinDecomposition of likelihood revisited Likelihood optimization decomposes:X1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=14©2005-2007 Carlos GuestrinStarting state probability P(X1) Using expected counts P(X1=a) = θX1=a15©2005-2007 Carlos GuestrinTransition probability P(Xt|Xt-1) Using expected counts P(Xt=a|Xt-1=b) = θXt=a|Xt-1=b16©2005-2007 Carlos GuestrinObservation probability P(Ot|Xt) Using expected counts P(Ot=a|Xt=b) = θOt=a|Xt=b17©2005-2007 Carlos GuestrinE-step revisited  E-step computes probability of hidden vars x given o Must compute: Q(xt=a|o) – marginal probability of each position Q(xt+1=a,xt=b|o) – joint distribution between pairs of positionsX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=18©2005-2007 Carlos GuestrinThe forwards-backwards algorithmX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=  Initialization:  For i = 2 to n Generate a forwards factor by eliminating Xi-1 Initialization:  For i = n-1 to 1 Generate a backwards factor by eliminating Xi+1 ∀ i, probability is:19©2005-2007 Carlos GuestrinE-step revisited  E-step computes probability of hidden vars xgiven o Must compute: Q(xt=a|o) – marginal probability of each position Just forwards-backwards! Q(xt+1=a,xt=b|o) – joint distribution between pairs of positionsX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=20©2005-2007 Carlos GuestrinWhat can you do with EM for HMMs? 1 – Clustering sequencesX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Independent clustering: Sequence clustering:21©2005-2007 Carlos GuestrinWhat can you do with EM for HMMs? 2 – Exploiting unlabeled dataX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=  Labeling data is hard work → save (graduate student) time by using both labeled and unlabeled data Labeled data: <X=“brace”,O= > Unlabeled data: <X=?????,O= >22©2005-2007 Carlos GuestrinExploiting unlabeled data in clustering A few data points are labeled <x,o> Most points are unlabeled <?,o> In the E-step of EM: If i’th point is unlabeled: compute Q(X|oi) as usual If i’th point is labeled: set Q(X=x|oi)=1 and Q(X≠x|oi)=0 M-step as usual23©2005-2007 Carlos Guestrin24©2005-2007 Carlos Guestrin20 Newsgroups data – advantage of adding unlabeled data25©2005-2007 Carlos Guestrin20 Newsgroups data – Effect of additional unlabeled data26©2005-2007 Carlos GuestrinExploiting unlabeled data in HMMs A few data points are labeled <x,o> Most points are unlabeled <?,o> In the E-step of EM: If i’th point is unlabeled: compute Q(X|oi) as usual If i’th point is labeled: set Q(X=x|oi)=1 and Q(X≠x|oi)=0 M-step as usual Speed up by remembering counts for labeled dataX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=27©2005-2007 Carlos GuestrinWhat you need to know Baum-Welch = EM for HMMs E-step: Inference using forwards-backwards M-step: Use weighted counts Exploiting unlabeled data: Some unlabeled data can help


View Full Document

CMU CS 10701 - EM for HMMs a.k.a. The Baum-Welch Algorithm

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download EM for HMMs a.k.a. The Baum-Welch Algorithm
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view EM for HMMs a.k.a. The Baum-Welch Algorithm and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view EM for HMMs a.k.a. The Baum-Welch Algorithm 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?