DOC PREVIEW
CMU CS 10708 - Lecture

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11School of Computer ScienceLearning Partially Observed GM: the Expectation-Maximization algorithmProbabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)Lecture 11, Oct 24, 2007Eric XingEric XingReceptor AKinase CTF FGene GGene HKinase EKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinase CTF FGene GGene HKinase EKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Reading: J-Chap. 10,11; KF-Chap. 17Eric Xing 2Partially observed GMsz Speech recognitionA AA AX2X3X1XTY2Y3Y1YT... ...2Eric Xing 3Partially observed GMz Biological EvolutionancestorACQhQmT years?AGAGACEric Xing 4Mixture Models3Eric Xing 5Mixture Models, con'dz A density model p(x) may be multi-modal.z We may be able to model it as a mixture of uni-modal distributions (e.g., Gaussians).z Each mode may correspond to a different sub-population (e.g., male and female).⇒Eric Xing 6Unobserved Variablesz A variable can be unobserved (latent) because:z it is an imaginary quantity meant to provide some simplified andabstractive view of the data generation processz e.g., speech recognition models, mixture models …z it is a real-world object and/or phenomena, but difficult or impossible to measurez e.g., the temperature of a star, causes of a disease, evolutionary ancestors …z it is a real-world object and/or phenomena, but sometimes wasn’t measured, because of faulty sensors, etc.z Discrete latent variables can be used to partition/cluster data into sub-groups.z Continuous latent variables (factors) can be used for dimensionality reduction (factor analysis, etc).4Eric Xing 7Gaussian Mixture Models (GMMs)z Consider a mixture of K Gaussian components:z This model can be used for unsupervised clustering.z This model (fit by AutoClass) has been used to discover new kinds of stars in astronomical data, etc.∑Σ=ΣkkkknxNxp ),|,(),(µπµmixture proportion mixture componentEric Xing 8Gaussian Mixture Models (GMMs)z Consider a mixture of K Gaussian components:zZis a latent class indicator vector:zXis a conditional Gaussian variable with a class-specific mean/covariancez The likelihood of a sample:()∏==kzknnknzzpππ):(multi)({})-()-(-exp)(),,|(//knkTknkmknnxxzxpµµπµ121212211−ΣΣ=Σ=()()∑∑∏∑Σ=Σ=Σ===ΣkkkkzkzkknzkkkknxNxNzxpzpxpnknkn),|,(),:(),,|,()|(),(µπµπµπµ11mixture proportionmixture componentZX5Eric Xing 9Why is Learning Harder?z In fully observed iid settings, the log likelihood decomposes into a sum of local terms (at least for directed models).z With latent variables, all the parameters become coupled together via marginalization),|(log)|(log)|,(log);(xzczxpzpzxpDθθθθ+==l∑∑==zxzzczxpzpzxpD ),|()|(log)|,(log);(θθθθlEric Xing 10z Recall MLE for completely observed dataz Data log-likelihoodz MLEz What if we do not know zn?CxzzxNzxpzpxzpDnkknknnkkknnkzknnkzknnnnnnnknkn+−=+===∑∑∑∑∑∏∑∏∏∏221)-(log),;(loglog),,|()|(log),(log);(2µπσµπσµπσθlToward the EM algorithmzixiN ),;(maxargˆ,DMLEkθlππ=);(maxargˆ,DMLEkθlµµ=);(maxargˆ,DMLEkθlσσ=∑∑=⇒nknnnknMLEkzxz,ˆ µ6Eric Xing 11Questionz “ … We solve problem X using Expectation-Maximization …”z What does it mean?z Ez What do we take Expectation with?z What do we take expectation over?z Mz What do we maximize?z What do we maximize with respect to?Eric Xing 12Recall: K-means)()(maxarg)()()()( tkntkTtknktnxxzµµ−Σ−=−1∑∑=+ntnnntntkkzxkz),(),()()()(δδµ17Eric Xing 13Expectation-Maximizationz Start: z "Guess" the centroidµkand coverianceΣkof each of the K clusters z LoopEric Xing 14Example: Gaussian mixture modelz A mixture of K Gaussians:zZis a latent class indicator vectorzXis a conditional Gaussian variable with class-specific mean/covariancez The likelihood of a sample:z The expected complete log likelihoodZnXnN()∏==kzknnknzzpππ):(multi)({})-()-(-exp)(),,|(//knkTknkmknnxxzxp µµπµ121212211−ΣΣ=Σ=()()∑∑∏∑Σ=Σ=Σ===ΣkkkkzkzkknzkkkknxNxNzxpzpxpnknkn),|,(),:(),,|,()|(),(µπµπµπµ11()∑∑∑∑∑∑+Σ+−Σ−−=Σ+=−nkkknkTknknnkkknnxzpnnnxzpncCxxzzzxpzpzxlog)()(21log),,|(log)|(log),;(1)|()|(µµπµπθl8Eric Xing 15z We maximize iteratively using the following iterative procedure:─ Expectation step: computing the expected value of the sufficient statistics of the hidden variables (i.e., z) given current est. of the parameters (i.e., πand µ). z Here we are essentially doing inference∑ΣΣ=Σ===ititintitktkntkttknqkntknxNxNxzpzt),|,(),|,(),,|1()()()()()()()()()()(µπµπµτ)(θclE-stepEric Xing 16Recall out objectivez The expected complete log likelihood()∑∑∑∑+Σ+−Σ−−=−nkkknkTknknnkkknCxxzz log)()(21log1µµπ∑∑Σ+=nxzpnnnxzpnczxpzpzx)|()|(),,|(log)|(log),;(µπθl9Eric Xing 17z We maximize iteratively using the following iterative procudure:─ Maximization step: compute the parameters under current results of the expected value of the hidden variablesz This is isomorphic to MLE except that the variables that are hidden are replaced by their expectations (in general they will by replaced by their corresponding "sufficient statistics") )(θclM-step 1 s.t. , ,0)( ,)(maxarg)(*k*)(NnNNzkllkntknnqknkkccktk===⇒=∀=⇒=∑∑∑∂∂τππππθθ∑∑=⇒=+ntknnntkntkkxl)()()1(* ,)(maxargττµµθ∑∑+++−−=Σ⇒=ΣntknnTtkntkntkntkkxxl)()1()1()()1(*))(( ,)(maxargτµµτθTTTxxxx=∂∂=∂∂−− AA A AAlog:Fact11Eric Xing 18Compare: K-means and EMz K-meansz In the K-means “E-step” we do hard assignment:z In the K-means “M-step” we update the means as the weighted sum of the data, but now the weights are 0 or 1:z EMz E-stepz M-step)()(maxarg)()()()( tkntkTtknktnxxzµµ−Σ−=−1∑∑=+ntnnntntkkzxkz),(),()()()(δδµ1The EM algorithm for mixtures of Gaussians is like a "soft version" of the K-means algorithm.∑ΣΣ=Σ===ititintitktkntkttknqkntknxNxNxzpzt),|,(),|,(),,|1()()()()()()()()()()(µπµπµτ∑∑=+ntknnntkntkx)()()1(ττµ10Eric Xing 19Theory underlying EMz What are we doing?z Recall that according to MLE, we intend to learn the model parameter that would have maximize the likelihood of the data. z But we do not observe z, so computing is difficult!z What shall we do?∑∑==zxzzczxpzpzxpD ),|()|(log)|,(log);(θθθθlEric Xing 20Complete & Incomplete Log Likelihoodsz Complete log likelihoodLet Xdenote the observable variable(s), and Zdenote the latent variable(s). If Zcould be observed, thenz Usually,


View Full Document

CMU CS 10708 - Lecture

Documents in this Course
Lecture

Lecture

15 pages

Lecture

Lecture

25 pages

Lecture

Lecture

24 pages

causality

causality

53 pages

lecture11

lecture11

16 pages

Exam

Exam

15 pages

Notes

Notes

12 pages

lecture

lecture

18 pages

lecture

lecture

16 pages

Lecture

Lecture

17 pages

Lecture

Lecture

15 pages

Lecture

Lecture

17 pages

Lecture

Lecture

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

16 pages

r6

r6

22 pages

lecture

lecture

20 pages

lecture

lecture

35 pages

Lecture

Lecture

21 pages

lecture

lecture

21 pages

lecture

lecture

13 pages

review

review

50 pages

Semantics

Semantics

30 pages

lecture21

lecture21

26 pages

MN-crf

MN-crf

20 pages

hw4

hw4

5 pages

lecture

lecture

12 pages

Lecture

Lecture

25 pages

Lecture

Lecture

25 pages

Lecture

Lecture

14 pages

Lecture

Lecture

15 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?