DOC PREVIEW
CMU CS 10601 - topic-models-mar-19

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Generative Topic Models for Community AnalysisObjectivesOutlineIntroduction to Topic ModelsSlide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Hyperlink modeling using PLSAHyperlink modeling using PLSA [Cohn and Hoffman, NIPS, 2001]Hyperlink modeling using PLSA [Cohn and Hoffman, NIPS, 2001]Slide 23Slide 24Hyperlink modeling using LDAHyperlink modeling using LDA [Erosheva, Fienberg, Lafferty, PNAS, 2004]Slide 27Author-Topic Model for Scientific LiteratureAuthor-Topic Model for Scientific Literature [Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004]Author-Topic Model for Scientific Literature [Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004]Slide 31Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05]Slide 33Slide 34Slide 35Slide 36Modeling Citation InfluencesModeling Citation Influences [Dietz, Bickel, Scheffer, ICML 2007]Slide 39Slide 40Link-PLSA-LDA: Topic Influence in Blogs (ICWSM 2008)Slide 42Generative Topic Models for Community Analysis Pilfered from: Ramesh Nallapatihttp://www.cs.cmu.edu/~wcohen/10-802/lda-sep-18.ppt2 / 57Objectives•Cultural literacy for ML: –Q: What are “topic models”?–A1: popular indoor sport for machine learning researchers–A2: a particular way of applying unsupervised learning of Bayes nets to text•Quick historical survey of some sample papers in the area3 / 57Outline•Part I: Introduction to Topic Models–Naive Bayes model–Mixture Models•Expectation Maximization–PLSA–LDA•Variational EM•Gibbs Sampling•Part II: Topic Models for Community Analysis–Citation modeling with PLSA–Citation Modeling with LDA–Author Topic Model–Author Topic Recipient Model–Modeling influence of Citations–Mixed membership Stochastic Block Model4 / 57Introduction to Topic Models•Multinomial Naïve BayesCW1W2W3…..WNM• For each document d = 1,, M• Generate Cd ~ Mult( ¢ | )• For each position n = 1,, Nd• Generate wn ~ Mult(¢|,Cd)5 / 57Introduction to Topic Models•Naïve Bayes Model: Compact representationCW1W2W3…..WNCWNMM6 / 57Introduction to Topic Models•Mixture model: unsupervised naïve Bayes modelCWNM• Joint probability of words and classes:• But classes are not visible:Z7 / 57Introduction to Topic Models8 / 57Introduction to Topic Models•Probabilistic Latent Semantic Analysis ModeldzwM• Select document d ~ Mult()• For each position n = 1,, Nd• generate zn ~ Mult( ¢ | d)• generate wn ~ Mult( ¢ | zn)dNTopic distribution9 / 57Introduction to Topic Models•Probabilistic Latent Semantic Analysis Model–Learning using EM–Not a complete generative model •Has a distribution  over the training set of documents: no new document can be generated!–Nevertheless, more realistic than mixture model•Documents can discuss multiple topics!10 / 57Introduction to Topic Models•PLSA topics (TDT-1 corpus)11 / 57Introduction to Topic Models12 / 57Introduction to Topic Models•Latent Dirichlet AllocationzwMN• For each document d = 1,,M• Generate d ~ Dir(¢ | )• For each position n = 1,, Nd• generate zn ~ Mult( ¢ | d)• generate wn ~ Mult( ¢ | zn)13 / 57Introduction to Topic Models•Latent Dirichlet Allocation–Overcomes the issues with PLSA•Can generate any random document–Parameter learning:•Variational EM–Numerical approximation using lower-bounds–Results in biased solutions–Convergence has numerical guarantees•Gibbs Sampling –Stochastic simulation–unbiased solutions–Stochastic convergence14 / 57Introduction to Topic Models•Variational EM for LDA–Approximate the posterior by a simpler distribution• A convex function in each parameter!15 / 57Introduction to Topic Models•Gibbs sampling–Applicable when joint distribution is hard to evaluate but conditional distribution is known–Sequence of samples comprises a Markov Chain–Stationary distribution of the chain is the joint distribution16 / 57Introduction to Topic Models•LDA topics17 / 57Introduction to Topic Models•LDA’s view of a document18 / 57Introduction to Topic Models•Perplexity comparison of various models UnigramMixture modelPLSALDALower is better19 / 57Outline•Part I: Introduction to Topic Models–Naive Bayes model–Mixture Models•Expectation Maximization–PLSA–LDA•Variational EM•Gibbs Sampling•Part II: Topic Models for Community Analysis–Citation modeling with PLSA–Citation Modeling with LDA–Author Topic Model–Author Topic Recipient Model–Modeling influence of Citations–Mixed membership Stochastic Block Model20 / 57Hyperlink modeling using PLSA21 / 57Hyperlink modeling using PLSA[Cohn and Hoffman, NIPS, 2001] dzwMdNzc• Select document d ~ Mult()• For each position n = 1,, Nd• generate zn ~ Mult( ¢ | d)• generate wn ~ Mult( ¢ | zn)• For each citation j = 1,, Ld • generate zj ~ Mult( ¢ | d)• generate cj ~ Mult( ¢ | zj)L22 / 57Hyperlink modeling using PLSA[Cohn and Hoffman, NIPS, 2001]dzwMdNzcLPLSA likelihood:New likelihood: Learning using EM23 / 57Hyperlink modeling using PLSA[Cohn and Hoffman, NIPS, 2001]Heuristic: 0 ·  · 1 determines the relative importance of content and hyperlinks (1-)24 / 57Hyperlink modeling using PLSA[Cohn and Hoffman, NIPS, 2001]•Classification performanceHyperlink contentHyperlinkcontent25 / 57Hyperlink modeling using LDA26 / 57Hyperlink modeling using LDA[Erosheva, Fienberg, Lafferty, PNAS, 2004]zwMN• For each document d = 1, ,M• Generate d ~ Dir(¢ | )• For each position n = 1,, Nd• generate zn ~ Mult( ¢ | d)• generate wn ~ Mult( ¢ | zn)•For each citation j = 1,, Ld • generate zj ~ Mult( . | d)• generate cj ~ Mult( . | zj)zcLLearning using variational EM27 / 57Hyperlink modeling using LDA[Erosheva, Fienberg, Lafferty, PNAS, 2004]28 / 57Author-Topic Model for Scientific Literature29 / 57Author-Topic Model for Scientific Literature[Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004]zwMN• For each author a = 1,,A• Generate a ~ Dir(¢ | )• For each topic k = 1,,K• Generate k ~ Dir( ¢ | )•For each document d = 1,,M• For each position n = 1,, Nd•Generate author x ~ Unif(¢ | ad)• generate zn ~ Mult( ¢ | a)• generate wn ~ Mult( ¢ | zn)xaAPK30 / 57Author-Topic Model for Scientific


View Full Document

CMU CS 10601 - topic-models-mar-19

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download topic-models-mar-19
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view topic-models-mar-19 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view topic-models-mar-19 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?