Machine learning lecture 14 Tommi S Jaakkola MIT CSAIL tommi csail mit edu Topics Probabilistic modeling mixture models and the EM algorithm mixtures of experts examples hierarchical models Tommi Jaakkola MIT CSAIL 2 Review mixture density Data generation process 2 P y y 1 y 2 1 P x y 1 p x P x y 2 X pj p x j j mixture of Gaussians j 1 2 Any data point x could have been generated in two ways the component responsible for generating x needs to be inferred Tommi Jaakkola MIT CSAIL 3 Review the EM algorithm E step softly assign examples to mixture components p j i P yi j xi for all j 1 2 and i 1 n M step re estimate the parameters separately for the two Gaussians based on the soft assignments n p j X n j where n j p j i n i 1 j n 1 X p j i xi n j i 1 j n 1 X p j i xi j xi j T n j i 1 Tommi Jaakkola MIT CSAIL 4 Mixture demo Tommi Jaakkola MIT CSAIL 5 The EM algorithm Each iteration of the EM algorithm monotonically increases the log likelihood of the n training examples x1 xn p xi n X z l D log p1 p xi 1 1 p2 p xi 2 2 i 1 where p1 p2 1 2 1 2 contains all the parameters of the mixture model 200 100 0 100 200 300 400 500 0 Tommi Jaakkola MIT CSAIL 5 10 15 20 25 30 35 6 The EM algorithm why does it work To show that EM increases the log likelihood of the data after each iteration we resort to the following auxiliary objective X X pj p xi j j J q q j i log q j i i j 1 2 where q j i are soft assignments of examples to mixture components We view both q and as parameters to be optimized We need to specify how we will use this objective algorithm how the algorithm relates to EM how the objective relates to the log likelihood Tommi Jaakkola MIT CSAIL 7 The auxiliary objective algorithm J q pj p xi j j q j i log q j i j 1 2 X X i We maximize this objective in the following alternating steps Let 0 be the initial setting of the parameters then at iteration k q k argmax J q k E step q k 1 argmax J q k M step Tommi Jaakkola MIT CSAIL 8 The auxiliary objective algorithm J q pj p xi j j q j i log q j i j 1 2 X X i We maximize this objective in the following alternating steps Let 0 be the initial setting of the parameters then at iteration k q k argmax J q k E step q k 1 argmax J q k M step Clearly these steps lead to a monotonically increasing value of the objective J q J q k 1 k J q k k J q k k 1 Tommi Jaakkola MIT CSAIL 9 Relation to EM Suppose we set q j i to be the posterior assignments p j i based on the current setting of the parameters X X pj p xi j j J p p j i log p j i i j 1 2 Tommi Jaakkola MIT CSAIL 10 Relation to EM Suppose we set q j i to be the posterior assignments p j i based on the current setting of the parameters X X pj p xi j j J p p j i log p j i i j 1 2 X X p j i log pj p xi j j const i Tommi Jaakkola MIT CSAIL j 1 2 11 Relation to EM Suppose we set q j i to be the posterior assignments p j i based on the current setting of the parameters X X pj p xi j j J p p j i log p j i i j 1 2 X X p j i log pj p xi j j const i j 1 2 X j 1 2 Tommi Jaakkola MIT CSAIL X p j i log pj p xi j j const i 12 Relation to EM Suppose we set q j i to be the posterior assignments p j i based on the current setting of the parameters X X pj p xi j j J p p j i log p j i i j 1 2 X X p j i log pj p xi j j const i j 1 2 X j 1 2 X p j i log pj p xi j j const i z weighted log likelihood The estimation of reduces to finding ML Gaussians based on a weighted training set separately for each component M step of the EM algorithm Tommi Jaakkola MIT CSAIL 13 Relation to the log likelihood J q i pj p xi j j q j i log q j i j 1 2 X X This objective has two relevant properties for our purposes J q l D 1 J p l D 2 where p j i P j xi are the posterior assignments Tommi Jaakkola MIT CSAIL 14 Proof of property 1 For any setting of q j i J q i Tommi Jaakkola MIT CSAIL pj p xi j j q j i log q j i j 1 2 X X 15 Proof of property 1 For any setting of q j i pj p xi j j J q q j i log q j i i j 1 2 X X pj p xi j j log q j i q j i i j 1 2 X X Tommi Jaakkola MIT CSAIL 16 Proof of property 1 For any setting of q j i pj p xi j j J q q j i log q j i i j 1 2 X X pj p xi j j log q j i q j i i j 1 2 X X log pj p xi j j X X i Tommi Jaakkola MIT CSAIL j 1 2 17 Proof of property 1 For any setting of q j i pj p xi j j J q q j i log q j i i j 1 2 X X pj p xi j j log q j i q j i i j 1 2 X X log pj p xi j j X X i X j 1 2 log p xi l D i Tommi Jaakkola MIT CSAIL 18 Proof of property 2 By setting q j i equal to the posterior assignments pj p xi j j pj p xi j j p xi j 0 1 2 pj 0 p xi j 0 j 0 p j i P we get J p i Tommi Jaakkola MIT CSAIL pj p xi j j p j i log p j i j 1 2 X X …
View Full Document
Unlocking...