DOC PREVIEW
UB CSE 574 - Latent Variable View of EM

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Learning ! !! ! ! Srihari 1 Latent Variable View of EM Sargur Srihari [email protected] Learning ! !! ! ! Srihari Examples of latent variables 1. Mixture Model 2. Hidden Markov Model – Joint distribution is p(x,z)!• We don’t have values for z!2 – A single time slice is a mixture with components p(x|z) – An extension of mixture model • Choice of mixture component depends on choice of mixture component for previous distribution – Latent variables are multinomial variables zn • That describe component responsible for generating xnMachine Learning ! !! ! ! Srihari Another example of latent variables 3. Topic Models (Latent Dirichlet Allocation) – In NLP unobserved groups explain why some observed data are similar – Each document is a mixture of various topics (latent variables) – Topics generate words • CAT-related: milk, meow, kitten • DOG-related: puppy, bark, bone – Multinomial distributions over words with Dirichlet priors 3Machine Learning ! !! ! ! Srihari ML as a subfield of AI 4Machine Learning ! !! ! ! Srihari 5 Main Idea of EM • Goal of EM is: – find maximum likelihood models for distributions p(x) that have latent (or missing) data • E.g., GMMs, HMMs • In case of Gaussian mixture models – We have a complex distribution of observed variables x – We wish to estimate its parameters • Introduce latent variables z so that – the joint distribution p(x,z) of observed and latent variables is more tractable (since we know forms of components) – Complicated distribution is formed from simpler components • The original distribution is obtained by marginalizing the joint distributionMachine Learning ! !! ! ! Srihari 6 Alternative View of EM • This view recognizes key role of latent variables • Observed data Latent Variables – where nth row represents xnT=[xn1 xn2 xnD] – with corresponding row znT=[zn1 zn2 znK] • Goal of EM algorithm is to find maximum likelihood solution for p(X) given some X!• When we do not have Z!€ X =x1x2xn" # $ $ $ $ % & ' ' ' ' € Z =z1z2zn" # $ $ $ $ % & ' ' ' 'Machine Learning ! !! ! ! Srihari 7 Likelihood Function involving Latent Variables • Joint likelihood function is where is the set of all model parameters – E.g., means, covariances, responsibilities • Marginal likelihood function of observed data – From sum rule • Log likelihood function is € p(X |θ) = p(X,Z |θ)Z∑ € ln p(X |θ) = ln p(X, Z |θ)Z∑$ % & ' ( ) p(X, Z | θ)θMachine Learning ! !! ! ! Srihari 8 Latent Variables in EM • Log likelihood function is • Key Observation: – Summation over latent variables appears inside logarithm • Even if joint distribution belongs to exponential family the marginal distribution does not – Taking log of Sum of Gaussians does not give simple quadratic • Results in complicated expressions for maximum likelihood solution, i.e., what value of q maximizes the likelihood Summation inside brackets due to marginalization Not due to log-likelihood € ln p(X |θ) = ln p(X, Z |θ)Z∑$ % & ' ( ) p(X, Z | θ)p(X | θ)Machine Learning ! !! ! ! Srihari Complete and Incomplete Data Sets Complete Data {X,Z} • For each observation in X we know corresponding value of latent variable Z • Log-likelihood has the form – maximization is straightforward Incomplete Data {X} • Actual data set • Log likelihood function is • Maximization is difficult – summations inside logarithm 9 ln p(X |θ) = ln p(X,Z|θ)Z∑⎧⎨⎩⎫⎬⎭p(X, Z | θ)Machine Learning ! !! ! ! Srihari 10 Expectation of log-likelihood • Since we don’t have the complete data set {X,Z} we evaluate the expected log-likelihood, i.e., • Since we are given X, our state of knowledge of Z is given only by the posterior distribution of the latent variables • Thus expected log-likelihood of complete data is E [ ln p(X , Z |θ)]= p(Z | X,θ)ln p(X, Z |θ)Z∑Summation is due to expectation not sum rule! We maximize this. Note that the logarithm acts on the joint-- which is tractable E[ln p(X,Z|θ)]p(Z | X, θ)Machine Learning ! !! ! ! Srihari E and M Steps • E Step: Use current parameter value to find the posterior distribution of the latent variables given by • Use this posterior to find expectation of complete-data log-likelihood for some general parameter value . This expectation is • M Step: Determine revised parameter estimate by maximizing 11 Q(θ,θold) = ln p(X, Z |θ)p(Z | X,θ)z∑ € θnew= argmaxθQ(θ,θold)Summation due to expectation θoldp(Z | X, θold)θnewθMachine Learning ! !! ! ! Srihari 12 General EM Algorithm • Given joint distribution over observed variables X and latent variables Z governed by parameters ! goal is to maximize likelihood function • Step 1: Choose an initial setting for the parameters • Step 2: E Step: Evaluate • Step 3: M Step: Evaluate given by where • Check for convergence – of either log-likelihood or parameter values • If not satisfied then let • Return to Step 2 Q(θ,θold) = p(Z | X, Z∑θold)ln p(X, Z |θ)p(X, Z | θ)θθoldθnewp(Z | X, θold)p(X | θ) € θnew= argmaxθQ(θ,θold)θold←θnewMachine Learning ! !! ! ! Srihari 13 Missing Variables • EM has been described for maximum likelihood function when there are discrete latent variables • It can also be applied when there are unobserved variables corresponding to missing values in data set – Take the joint distribution of all variables and then marginalize over missing ones – EM is then used to maximize corresponding


View Full Document

UB CSE 574 - Latent Variable View of EM

Download Latent Variable View of EM
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Latent Variable View of EM and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Latent Variable View of EM 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?