Expectation Maximization AlgorithmA Mixture Model ProblemGaussian Mixture Model (GMM)EM Algorithm for GMMStart with A Random GuessSlide 6E-stepSlide 8M-StepSlide 10At the 5-th IterationAt the10-th IterationAt the 20-th IterationAt the 50-th IterationAt the 100-th IterationEM as A Bound OptimizationSlide 17Slide 18Logarithm Bound AlgorithmSlide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Log-Likelihood of EM Alg.Maximize GMM ModelSlide 31Identify Hidden VariablesSlide 33Slide 34Slide 35Slide 36EM Algorithm for A Translation ModelSlide 38Compute Pr(e|c)Slide 40Bound Optimization for A Translation ModelSlide 42Iterative ScalingSlide 44Slide 45Slide 46Slide 47Slide 48Slide 49Slide 50Slide 51Slide 52Slide 53Slide 54Slide 55Slide 56Slide 57Faster Iterative ScalingSlide 59Bad NewsComparing Improved Iterative Scaling to Newton’s MethodExpectation Maximization AlgorithmRong JinA Mixture Model ProblemApparently, the dataset consists of two modesHow can we automatically identify the two modes?0 5 10 15 20 2502468101214161820Gaussian Mixture Model (GMM)Assume that the dataset is generated by two mixed Gaussian distributionsGaussian model 1:Gaussian model 2: If we know the memberships for each bin, estimating the two Gaussian models is easy.How to estimate the two Gaussian models without knowing the memberships of bins?{ }1 1 1 1, ; pq m s={ }2 2 2 2, ; pq m s=EM Algorithm for GMMLet memberships to be hidden variablesEM algorithm for Gaussian mixture modelUnknown memberships:Unknown Gaussian models:Learn these two sets of parameters iteratively( ) ( ) ( ){ }1 21 2 1 2{ , ,..., } , , , ,..., ,n n nmx x x x x xm m�( ) ( ) ( ){ }1 21 2, , , ,..., ,n nx x xm m m{ }{ }1 1 1 12 2 2 2, ;, ;ppq m sq m s==Start with A Random GuessRandom assign the memberships to each bin0 5 10 15 20 25024681012141618200 5 10 15 20 2500.10.20.30.40.50.60.70.80.91Start with A Random GuessRandom assign the memberships to each binEstimate the means and variance of each Gaussian model0 5 10 15 20 2500.10.20.30.40.50.60.70.80.910 5 10 15 20 2502468101214161820E-stepFixed the two Gaussian modelsEstimate the posterior for each data point ( )1 1 1 11 2 1 1 1 2 2 22 2 2 21 2 1 1 1 2 2 2211 12211( , ) ( | , )( , 1)( 1| )( ) ( , ) ( , ) ( | , ) ( | , )( , ) ( | , )( , 2)( 2 | )( ) ( , ) ( , ) ( | , ) ( | , )1( | , ) exp22p x p x pp x mp m xp x p x p x p x p p x pp x p x pp x mp m xp x p x p x p x p p x pxp xq m sq q m s m sq m sq q m s m smm ssps== = = =+ +== = = =+ +� �-�= -�� �( )221 122221, ( | , ) exp22xp xmm ssps� �-� � �= -� � �� �EM Algorithm for GMMRe-estimate the memberships for each bin0 5 10 15 20 2500.10.20.30.40.50.60.70.80.910 5 10 15 20 2502468101214161820{ }[ ] [ ]{ }1 211 1 1 2 2 21ˆ ˆ( 1| )log ( , ) ( 2 | ) log ( , )ˆ ˆ( 1| ) log log ( | , ) ( 2 | ) log log ( | , )ni i i i i iini i i i i iil p m x p x p m x p xp m x p p x p m x p p xq qm s m s=== = + == = + + = +��22 21 1 11 1 1 11 1221 1 12 2 21 1ˆ ˆ ˆ( 1| ) ( 1| ) ( 1| ), ,ˆ ˆ( 1| ) ( 1| )ˆ ˆ ˆ( 2 | ) ( 2 | ) ( 2 | ), ,ˆ ˆ( 2 | ) ( 2 | )n n ni i i i i i i ii i in ni i i ii in n ni i i i i i i ii i in ni i i ii ip m x p m x x p m x xpnp m x p m xp m x p m x x p m x xpnp m x p m xm s mm s= = == == = == == = == = = -= == = == = == =� � �� �� � �� �22m-M-StepFixed the membershipsRe-estimate the two model GaussianWeighted by posteriorsWeighted by posteriorsEM Algorithm for GMMRe-estimate the memberships for each binRe-estimate the models0 5 10 15 20 2500.10.20.30.40.50.60.70.80.910 5 10 15 20 2502468101214161820At the 5-th IterationRed Gaussian component slowly shifts toward the left end of the x axis0 5 10 15 20 25024681012141618200 5 10 15 20 2500.10.20.30.40.50.60.70.80.9At the10-th IterationRed Gaussian component still slowly shifts toward the left end of the x axis0 5 10 15 20 25024681012141618200 5 10 15 20 2500.10.20.30.40.50.60.70.80.9At the 20-th IterationRed Gaussian component make more noticeable shift toward the left end of the x axis0 5 10 15 20 25024681012141618200 5 10 15 20 2500.10.20.30.40.50.60.70.80.91At the 50-th IterationRed Gaussian component is close to the desirable location0 5 10 15 20 25024681012141618200 5 10 15 20 2500.10.20.30.40.50.60.70.80.91At the 100-th IterationThe results are almost identical to the ones for the 50-th iteration0 5 10 15 20 25024681012141618200 5 10 15 20 2500.10.20.30.40.50.60.70.80.91EM as A Bound OptimizationEM algorithm in fact maximizes the log-likelihood function of training dataLikelihood for a data point xLog-likelihood of training data( ) ( )1 2 1 1 1 2 2 22 21 21 1 1 12 22 21 21 2( ) ( , ) ( , ) ( | , ) ( | , )1 1( | , ) exp , ( | , ) exp2 22 2p x p x p x p x p p x px xp x p xq q m s m sm mm s m ss sps ps= + = +� � � �- -� � � �= - = -� � � �� � � �{ }1 1 1 2 2 21 1log ( ) log ( | , ) ( | , )n nii il p x p x p p x pm s m s= == = +� �EM as A Bound OptimizationEM algorithm in fact maximizes the log-likelihood function of training dataLikelihood for a data point xLog-likelihood of training data( ) ( )1 2 1 1 1 2 2 22 21 21 1 1 12 22 21 21 2( ) ( , ) ( , ) ( | , ) ( | , )1 1( | , ) exp , ( | , ) exp2 22 2p x p x p x p x p p x px xp x p xq q m s m sm mm s m ss sps ps= + = +� � � �- -� � � �= - = -� � � �� � � �{ }1 1 1 2 2 21 1log ( ) log ( | , ) ( | , )n nii il p x p x p p x pm s m s= == = +� �EM as A Bound OptimizationEM algorithm in fact maximizes the log-likelihood function of training dataLikelihood for a data point xLog-likelihood of training data( ) ( )1 …
View Full Document