This preview shows page 1-2-3-23-24-25-26-46-47-48 out of 48 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

IntroductionMotivating ApplicationsConnections to other SurveysTopic ModelsPlate NotationEarlier Topic ModelsLatent Dirichlet AllocationExtensions and ApplicationsModeling multiple influencesHierarchical Topic ModelsBeyond Bag of WordsApplication: Object Recognition in ImagesIntroductionTopic ModelsExtensions and ApplicationsTopic Models(Generative Clustering Models)Roman Stanchak and Prithviraj SenCMSC828G, Instructor: Prof. Lise Getoor24thApril, 2008.Topic Models, (Generative Clustering Models) –roman, prithvi 1/48IntroductionTopic ModelsExtensions and ApplicationsOutline1IntroductionMotivating ApplicationsConnections to other Surveys2Topic ModelsPlate NotationEarlier Topic ModelsLatent Dirichlet Allocation3Extensions and ApplicationsModeling multiple influencesHierarchical Topic ModelsBeyond Bag of WordsApplication: Object Recognition in ImagesTopic Models, (Generative Clustering Models) –roman, prithvi 2/48IntroductionTopic ModelsExtensions and ApplicationsMotivating ApplicationsConnections to other SurveysOutline1IntroductionMotivating ApplicationsConnections to other Surveys2Topic ModelsPlate NotationEarlier Topic ModelsLatent Dirichlet Allocation3Extensions and ApplicationsModeling multiple influencesHierarchical Topic ModelsBeyond Bag of WordsApplication: Object Recognition in ImagesTopic Models, (Generative Clustering Models) –roman, prithvi 3/48IntroductionTopic ModelsExtensions and ApplicationsMotivating ApplicationsConnections to other SurveysMotivating Appl icationsMixed membership clustering of document copora:Ie.g., document → wordsModeling consumer behaviour for marketing data:Ie.g., households → trips → productsFraud detection in telecommunications:Ie.g., users → call featuresProtein function prediction:Ie.g., mixed membership of proteins to functional modulesObject detection/recognition in images:Ie.g., images → feature patchesTopic Models, (Generative Clustering Models) –roman, prithvi 4/48IntroductionTopic ModelsExtensions and ApplicationsMotivating ApplicationsConnections to other SurveysConnections to other SurveysCollective classification:Idiscriminative vs. generativeIEdo’s talk, missing link model [Cohn and Hofmann, 2001]Entity resolution:ILDA-ERGroup De tec tion Surveys:IStochastic Block ModelsIClustering in Relational Data/Community DetectionTopic Models, (Generative Clustering Models) –roman, prithvi 5/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationOutline1IntroductionMotivating ApplicationsConnections to other Surveys2Topic ModelsPlate NotationEarlier Topic ModelsLatent Dirichlet Allocation3Extensions and ApplicationsModeling multiple influencesHierarchical Topic ModelsBeyond Bag of WordsApplication: Object Recognition in ImagesTopic Models, (Generative Clustering Models) –roman, prithvi 6/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationPlate Notation: A Slacker’s Day Planner0.3 0.1 0.1 0.10.4upbeatafty even.moodnightmood: upbeat, bored, sadactivities: go to sleep, watch TV, go to pub, go to beach, go bowlingma3Dnodes edges platesrandom variables dependencies repetitionsTopic Models, (Generative Clustering Models) –roman, prithvi 7/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationUnigram Model and Mixture of UnigramsNwNMzwUnigram Model Mixture of UnigramsDisadvantages:IDoes not model documents dealing with a mixture of topics.Mixture of Unigrams:IAlso known as, naive bayes model [McCallum and Nigam, 1998]IGenerative single class classification modelTopic Models, (Generative Clustering Models) –roman, prithvi 8/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationPLSI: Mixture Mod el for Text [Hofmann, 1999]NMwzdAdvantage:IFirst mixture model for documentsDisadvantage:IMixture parameters for each document, too many parametersIPoor generalization propertiesTopic Models, (Generative Clustering Models) –roman, prithvi 9/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationProblems with PLSI2-D simplex showing the space of document mixtures for 3 topics***************************************PLSI**** ***************** ******************LDATopic Models, (Generative Clustering Models) –roman, prithvi 10/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationLatent Dirichlet Allocation [Blei et al, 2003]TNMwzαβφθGenerative process:IChoose θ ∼ Dir(α)IFor each word in doc:IChoose topic z ∼ mult(θ)IChoose word w ∼ mult(φz)M # of DocumentsN # of WordsT # of Topicsw Generated wordz Topic of word wθ Distribution of topicsφzDistribution of words given topic zα Dirichlet parameterβ Dirichlet parameterTopic Models, (Generative Clustering Models) –roman, prithvi 11/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationDiscriminative vs. GenerativeWord topicsarts budget educationnew million schoolfilm tax studentsshow program schoolsmusic budget educationmovie billion teachersplay federal highmusical year publicbest spending teacher.........Document mixturesIθ29795: ..... wanted to play jazz....Iθ1883: .... play ... performed ...stage ....Iθ21359: ..... don and jim play thegame ....IThe θ’s estimated for eachdocument can b e used as a lowdim. rep. for the doc., can beused to classify the docs.Topic Models, (Generative Clustering Models) –roman, prithvi 12/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationGibbs Sampling for LDA [Griffiths and Steyvers, 2004]P(zi= j|z−j, w) =nwi−i,j+ βPwinwi−i,j+ W β| {z }prob. of wiunder topic jprob. of ziin doc containing wiz }| {ndi−i,j+ αPjndi−i,j+ T αIPerform burn-inIRun iterations of the Gibbs sampler collecting samples after regular intervalsIFor each iteration:IFor word wiin corpus, sample zifrom P(zi= j|z−i, w)IStraightforward to recover θ’s and φ’s after Gibbs sampler has convergedTopic Models, (Generative Clustering Models) –roman, prithvi 13/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationAbout LDA and Gibbs SamplingWhy dirichlet?IConjugate prior of multinomial. Lets you


View Full Document

UMD CMSC 828G - Topic Models

Documents in this Course
Lecture 2

Lecture 2

35 pages

Load more
Download Topic Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Topic Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Topic Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?