DOC PREVIEW
CMU CS 10701 - EM for HMMs a.k.a. The Baum-Welch Algorithm

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1EM for HMMsa.k.a. The Baum-Welch AlgorithmMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityApril 12th, 2006Recommended reading: “An Introduction to HMMs and Bayesian Networks,”Z. Ghahramani, Int. Journal of Pattern Recognition and AI, 15(1):9-42, (2001)Especially Section 42Learning HMMsfrom fully observable data is easyX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Learn 3 distributions:3Learning HMMsfrom fully observable data is easyX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Learn 3 distributions:What if O is observed, but X is hidden4Log likelihood for HMMswhen Xis hidden  Marginal likelihood – O is observed, X is missing For simplicity of notation, training data consists of only one sequence: If there were m sequences:5Computing Log likelihood for HMMs when X is hiddenX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=6Computing Log likelihood for HMMswhen X is hidden – variable elimination Can compute efficiently with variable elimination:X1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=7EM for HMMs when X is hidden E-step: Use inference (forwards-backwards algorithm) M-step: Recompute parameters with weighted dataX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=8E-step E-step computes probability of hidden vars x given o Will correspond to inference use forward-backward algorithm!X1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=9The M-step Maximization step: Use expected counts instead of counts: If learning requires Count(x,o) Use EQ(t+1)[Count(x,o)]X1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=10Decomposition of likelihood revisited Likelihood optimization decomposes:X1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=11Starting state probability P(X1) Using expected counts P(X1=a) = θX1=a12Transition probability P(Xt|Xt-1) Using expected counts P(Xt=a|Xt-1=b) = θXt=a|Xt-1=b13Observation probability P(Ot|Xt) Using expected counts P(Ot=a|Xt=b) = θOt=a|Xt=b14E-step revisited  E-step computes probability of hidden vars xgiven o Must compute: Q(xt=a|o) – marginal probability of each position Q(xt+1=a,xt=b|o) – joint distribution between pairs of positionsX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=15The forwards-backwards algorithmX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=  Initialization:  For i = 2 to n Generate a forwards factor by eliminating Xi-1 Initialization:  For i = n-1 to 1 Generate a backwards factor by eliminating Xi+1 ∀ i, probability is:16E-step revisited  E-step computes probability of hidden vars xgiven o Must compute: Q(xt=a|o) – marginal probability of each position Just forwards-backwards! Q(xt+1=a,xt=b|o) – joint distribution between pairs of positions Homework! ☺X1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=17What can you do with EM for HMMs? 1 – Clustering sequencesX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Independent clustering: Sequence clustering:18What can you do with EM for HMMs? 2 – Exploiting unlabeled dataX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=  Labeling data is hard work → save (graduate student) time by using both labeled and unlabeled data Labeled data: <X=“brace”,O= > Unlabeled data: <X=?????,O= >19Exploiting unlabeled data in clustering A few data points are labeled <x,o> Most points are unlabeled <?,o> In the E-step of EM: If i’th point is unlabeled: compute Q(X|oi) as usual If i’th point is labeled: set Q(X=x|oi)=1 and Q(X≠x|oi)=0 M-step as usual202120 Newsgroups data – advantage of adding unlabeled data2220 Newsgroups data – Effect of additional unlabeled data23Exploiting unlabeled data in HMMs A few data points are labeled <x,o> Most points are unlabeled <?,o> In the E-step of EM: If i’th point is unlabeled: compute Q(X|oi) as usual If i’th point is labeled: set Q(X=x|oi)=1 and Q(X≠x|oi)=0 M-step as usual Speed up by remembering counts for labeled dataX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=24What you need to know Baum-Welch = EM for HMMs E-step: Inference using forwards-backwards M-step: Use weighted counts Exploiting unlabeled data: Some unlabeled data can help classification Small change to EM algorithm In E-step, only use inference for unlabeled data25Acknowledgements  Experiments combining labeled and unlabeled data provided by Tom Mitchell26EM for Bayes NetsMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityApril 12th, 200627Data likelihood for BNs Given structure, log likelihood of fully observed data:FluAllergySinusHeadacheNose28Marginal likelihood What if S is hidden?FluAllergySinusHeadacheNose29Log likelihood for BNswith hidden data Marginal likelihood – O is observed, H is hiddenFluAllergySinusHeadacheNose30E-step for BNs E-step computes probability of hidden vars h given o Corresponds to inference in BNFluAllergySinusHeadacheNose31The M-step for BNs Maximization step: Use expected counts instead of counts: If learning requires Count(h,o) Use EQ(t+1)[Count(h,o)]FluAllergySinusHeadacheNose32M-step for each CPT M-step decomposes per CPT Standard MLE: M-step uses expected counts:FluAllergySinusHeadacheNose33Computing expected counts M-step requires expected counts: For a set of vars A, must compute ExCount(A=a) Some of A in example j will be observed denote by AO=


View Full Document

CMU CS 10701 - EM for HMMs a.k.a. The Baum-Welch Algorithm

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download EM for HMMs a.k.a. The Baum-Welch Algorithm
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view EM for HMMs a.k.a. The Baum-Welch Algorithm and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view EM for HMMs a.k.a. The Baum-Welch Algorithm 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?