DOC PREVIEW
UT Dallas CS 6375 - em

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11Machine LearningCS 6375 --- Spring 2015Gaussian Mixture Model (GMM) Expectation Maximization (EM) Acknowledgement: some slides adopted from Christopher Bishop, Vincent Ng.2K-means Algorithm• Special case of EM• Goal: represent a data set in terms of K clusters, each of which is summarized by a prototype µk• Initialize prototypes, then iterate between two phases:– E step: assign each data point to nearest prototype– M step: update prototypes to be the cluster means• Simplest version is based on Euclidean distance23Probabilistic ClusteringRepresent the probability distribution of the data as a mixture model- captures uncertainty in cluster assignments- gives model for data distributionConsider mixtures of Gaussians4Maximum Likelihood SolutionMaximizing w.r.t. the mean gives the sample meanMaximizing w.r.t covariance gives the sample covariance35Gaussian MixturesLinear super-position of GaussiansNormalization and positivity requireCan interpret the mixing coefficients as prior probabilities6Example: Mixture of 3 Gaussians0 0.5 100.51(a)47Contours of Probability Distribution0 0.5 100.51(b)8Sampling from the Gaussian• To generate a data point:– first pick one of the components with probability – then draw a sample from that component• Repeat these two steps for each new data point59Synthetic Data Set0 0.5 100.51(a)10Fitting the Gaussian Mixture• We wish to invert this process ― given the data set, find the corresponding parameters:– mixing coefficients– means – covariances• If we knew which component generated each data point, the maximum likelihood solution would involve fitting each component to the corresponding cluster• Problem: the data set is unlabelled• We shall refer to the labels as latent (= hidden) variables611Synthetic Data Set Without Labels0 0.5 100.51(b)12Posterior Probabilities• We can think of the mixing coefficients as prior probabilities for the components• For a given value of we can evaluate the corresponding posterior probabilities.• These are given from Bayes’ theorem by713Posterior Probabilities (colour coded)0 0.5 100.51(a)14Maximum Likelihood for the GMMThe log likelihood function takes the formNote: sum over components appears inside the logThere is no closed form solution for maximum likelihood– Solved by expectation-maximization (EM) algorithm– Fix K815EM Algorithm – Informal DerivationLet us proceed by simply differentiating the log likelihoodFor µj,),(}log{),,|(log1 1kkiikNiKkikkNNNDP Σ==Σ∑ ∑= =µππµ0)(11=−Σ−=∑∑jijNikikkijjuxNNππ)(ijxγ∑∑=ijiiijixγγµ16EM Algorithm – Informal DerivationSimilarly for the covariancesFor mixing coefficients use a Lagrange multiplier (constrain: sum up to 1)917EM Algorithm – Informal Derivation• The solutions are not closed form since they are coupled• Suggests an iterative scheme for solving them:– make initial guesses for the parameters– alternate between the following two stages:• E-step: evaluate responsibilities• M-step: update parameters using ML results• Each EM cycle guaranteed not to decrease the likelihood18101920112122122324Relation to K-meansConsider GMM with common covariances (Σ=I)Take limitResponsibilities become binary EM algorithm is precisely equivalent to K-means1325EM for GMMIterate. On the tthiteration• E-step: compute “expected” classes of all data points for each class • M-step: compute maximum likelihood µ given our data’s class membership distributions, e.g.,26The EM Algorithm In GeneralGiven observed variable X, unobserved ZE-step: expected loglikelihood– for GMM, known co-variance, parameters areM-step: maximize Q to find the new θ’.1427The EM Algorithm• Identify the sufficient statistics for estimating the θ’s• Initialize the θ’s to some arbitrary (non-zero) values θ0• Iterate the E-step and the M-step. During step k,• compute the expected values of the sufficient statistics based on the current parameter estimates θk (E-step)• derive θk+1 as an ML estimate using the values of the sufficient statistics computed in the E-step (M-step)• Terminate when )|()|(1 kkdataLdataLθθ≈+28Is Incomplete Log Likelihood Maximized?Theorem: Let x be our incomplete data, h our hidden data, and θ a parametric model that generates x and h.If we choose θ’ such that (increasing expected LL)thenLemma:),(log),(log)|(')|(hxPEhxPEiiixhPxhPθθθθ>)()('xPxPiθθ>∑∑−≤−x xxqxpxpxp )(log)()(log)()(log)(log is,that xqExpEpp−≤−Increasing likelihood1529Taking expectation on both sides w.r.t. , we haveProof of EM))|()(log(),(log xhPxPhxPiiiθθθ=)|(log),(log)(log xhPhxPxPiiiθθθ−=)|(log),(log)(log)|()|()|(xhPEhxPExPEixhPixhPixhPiiiθθθθθθ−=)|( xhPiθ)|(log),(log)(log)|()|(xhPEhxPExPiiiiixhPxhPθθθθθ−=)(log)(log'xPxPiθθ>)|()(),( xhPxPhxPiiiθθθ=(*)Is ?30Proof of EM (Cont’))|(log),(log)(log)|()|(xhPEhxPExPiiiiixhPxhPθθθθθ−=))|(log(),(log)(log')|(')|('xhPEhxPExPxhPxhPiiθθθθθ−+=),(log),(log)|(')|(hxPEhxPEiiixhPxhPθθθθ≥(*)Substitute θ’ for θiin (*), we have:))|(log())|(log()|()|('xhPExhPEixhPixhPiPPθθθθ−≥−Now, by assumption, we have:By the lemma, we have:)(log)(log'xPxPiθθ>Adding the two gives1631EM Summary For learning from partly unobserved dataMLE estimate of EM estimatewhere X is observed part of data, Z is unobserved.)|(maxargθθθdataP=)]|,([logmaxarg,|θθθθZXPEXZ=32Using EM in PracticeEM may not work well in practice …Potential problems• get stuck at a local maximum• Solutions: select different starting points, search by simulated annealing• overfitting the training data• Solutions: use held-out data, add regularization• the underlying generative model is incorrect• Solutions: fix the model1733Over-fitting in Gaussian Mixture ModelsSingularities in likelihood function when a component ‘collapses’ onto a data point:then considerLikelihood function gets larger as we add more components (and hence parameters) to the model– not clear how to choose the number K of components34Can EM really improve the underlying classifier?It depends on whether• the data is generated by a mixture• there is a 1-to-1 mapping between the mixture components and


View Full Document

UT Dallas CS 6375 - em

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download em
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view em and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view em 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?