DOC PREVIEW
UB CSE 574 - Mixture Models and EM

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Learning ! !! ! ! Srihari 1 Mixture Models and EM Sargur Srihari [email protected] Learning ! !! ! ! Srihari 2 Plan of discussion • Examples of mixture models • K-means algorithm for finding clusters in a data set • Latent variable view of mixture distributions • Assigning data points to specific components of mixture • General technique for finding m.l. estimators in latent variable models • Expectation Maximization (EM) algorithm • EM Algorithm • Gaussian mixture models motivates EM • Latent variable viewpoint • K-means seen as non-probabilistic limit of EM applied to mixture of Gaussians • EM in generality – Infinite Mixture ModelsMachine Learning ! !! ! ! Srihari Definition • Probabilistic model representing sub-populations within a population – Without requiring that the sub-population of the data items be identified • Constructing such models is called unsupervised learning or clustering 3Machine Learning ! !! ! ! Srihari Example of a Mixture Model • Three-component model for a single variable with estimated densities 4Machine Learning ! !! ! ! Srihari More Examples • Also Called as Finite Mixture Models (FMMs) 5Machine Learning ! !! ! ! Srihari Bernoulli Mixture Model • Handwritten Digit Data • Mixture Model for 0-4 identifies 12 components 6Machine Learning ! !! ! ! Srihari 7 Role of Mixture Models • Mixture models provide: – Framework for building complex probability distributions – A method for clustering data • Viewed statistically as follows: – Complex distribution expressed in terms of more tractable joint distribution of observed and latent variables – Distribution of observed variables alone is obtained by marginalizationMachine Learning ! !! ! ! Srihari 8 K-means Clustering • Given data set {x1,..,xN} in D-dimensional Euclidean space • Partition into K clusters, which is given • One of K coding • Indicator variable rnk ∈ {0,1} where k =1,..,K – Describes which of K clusters data point xn is assigned toMachine Learning ! !! ! ! Srihari 9 Distortion measure • Sum of squared errors • Goal is to find values for {rnk} and the { } so as to minimize J – Can be done by iterative procedure – Each iteration has two steps • Successive optimization w.r.t. rnk and J = rnkk=1K∑n=1N∑|| xn−µk||2µµkµkMachine Learning ! !! ! ! Srihari 10 Two Updating Stages • First choose initial values for • First phase: – minimize J w.r.t. rnk keeping fixed • Second phase: – minimize J w.r.t. keeping rnk fixed • Two stages correspond to E (expectation) and M (maximization) of EM algorithm – Expectation: what is the expected class? – Maximization: what is the mle of the mean? µkµkMachine Learning ! !! ! ! Srihari 11 E: Determination of Indicator rnk • Because is a linear function of rnk this optimization is performed easily (closed form solution) • Terms involving different n are independent – Therefore can optimize for each n separately – Choosing rnk to be 1 for whichever value of k gives minimum value of ||xn- ||2 • Thus • Interpretation: – Assign xn to cluster whose mean is closest 211|| x ||NKnk n knkJrµ===−∑∑  rnk=1 if k = argminj|| xn−µj||20 otherwise⎧ ⎨ ⎩ µkMachine Learning ! !! ! ! Srihari 12 M: Optimization of • Hold rnk fixed • Objective function is a quadratic function of • Minimized by setting derivative w.r.t. to zero • Thus • Which is solved to give • Interpretation: – Set to mean of all data points xn assigned to cluster k xµnk nnknknrr=∑∑Equal to no of points assigned to cluster k  2 rnk(xn−µkn=1N∑) = 0211|| x ||NKnk n knkJrµ===−∑∑µkµkµkµkMachine Learning ! !! ! ! Srihari 13 Termination of K-Means • Two phases – re-assigning data points to clusters – Re-computing means of clusters • Done repeatedly until no further change in assignments • Since each phase reduces J convergence is assured • May converge to local minimum of JMachine Learning ! !! ! ! Srihari 14 Illustration of K-means • Old Faithful dataset • Single Gaussian is a poor fit • We choose K = 2 • Data set is standardized so each variable has zero mean and unit standard deviation • Assignment of each data point to nearest cluster center is equivalent to – which side of the perpendicular bisector of line joining cluster centersMachine Learning ! !! ! ! Srihari 15 K-means iterations E step M step E step E step E step M step M step M step Initial Choice of Means (Parameters) Final Clusters And Means E step: parameters are fixed Distributions are optimized M step: distributions are fixed Parameters are optimizedMachine Learning ! !! ! ! Srihari 16 Cost Function after Iteration • J for Old Faithful Data • Poor initial value chosen for cluster centers – Several steps needed for convergence – Better choice is to assign mk to random subset of k data points • K-means is itself used to initialize parameters for Gaussian mixture model before applying EMMachine Learning ! !! ! ! Srihari 17 Implementation of K-means • Direct implementation can be slow – In E step Euclidean distances are computed between every mean and every data point • ||xn- ||2 is computed for n=1,..N and k=1,..K • Faster implementations exist – Precomputing trees where nearby points are on same sub-tree – Use of triangle inequality to avoid unnecessary distance calculation µkMachine Learning ! !! ! ! Srihari 18 On-line Stochastic Version • Instead of batch processing entire data set • Apply Robbins-Monro procedure – To finding roots of the regression function given by


View Full Document

UB CSE 574 - Mixture Models and EM

Download Mixture Models and EM
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Mixture Models and EM and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Mixture Models and EM 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?