IntroductionMotivating ApplicationsConnections to other SurveysTopic ModelsPlate NotationEarlier Topic ModelsLatent Dirichlet AllocationExtensions and ApplicationsModeling multiple influencesHierarchical Topic ModelsBeyond Bag of WordsApplication: Object Recognition in ImagesIntroductionTopic ModelsExtensions and ApplicationsTopic Models(Generative Clustering Models)Roman Stanchak and Prithviraj SenCMSC828G, Instructor: Prof. Lise Getoor24thApril, 2008.Topic Models, (Generative Clustering Models) –roman, prithvi 1/48IntroductionTopic ModelsExtensions and ApplicationsOutline1IntroductionMotivating ApplicationsConnections to other Surveys2Topic ModelsPlate NotationEarlier Topic ModelsLatent Dirichlet Allocation3Extensions and ApplicationsModeling multiple influencesHierarchical Topic ModelsBeyond Bag of WordsApplication: Object Recognition in ImagesTopic Models, (Generative Clustering Models) –roman, prithvi 2/48IntroductionTopic ModelsExtensions and ApplicationsMotivating ApplicationsConnections to other SurveysOutline1IntroductionMotivating ApplicationsConnections to other Surveys2Topic ModelsPlate NotationEarlier Topic ModelsLatent Dirichlet Allocation3Extensions and ApplicationsModeling multiple influencesHierarchical Topic ModelsBeyond Bag of WordsApplication: Object Recognition in ImagesTopic Models, (Generative Clustering Models) –roman, prithvi 3/48IntroductionTopic ModelsExtensions and ApplicationsMotivating ApplicationsConnections to other SurveysMotivating Appl icationsMixed membership clustering of document copora:Ie.g., document → wordsModeling consumer behaviour for marketing data:Ie.g., households → trips → productsFraud detection in telecommunications:Ie.g., users → call featuresProtein function prediction:Ie.g., mixed membership of proteins to functional modulesObject detection/recognition in images:Ie.g., images → feature patchesTopic Models, (Generative Clustering Models) –roman, prithvi 4/48IntroductionTopic ModelsExtensions and ApplicationsMotivating ApplicationsConnections to other SurveysConnections to other SurveysCollective classification:Idiscriminative vs. generativeIEdo’s talk, missing link model [Cohn and Hofmann, 2001]Entity resolution:ILDA-ERGroup De tec tion Surveys:IStochastic Block ModelsIClustering in Relational Data/Community DetectionTopic Models, (Generative Clustering Models) –roman, prithvi 5/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationOutline1IntroductionMotivating ApplicationsConnections to other Surveys2Topic ModelsPlate NotationEarlier Topic ModelsLatent Dirichlet Allocation3Extensions and ApplicationsModeling multiple influencesHierarchical Topic ModelsBeyond Bag of WordsApplication: Object Recognition in ImagesTopic Models, (Generative Clustering Models) –roman, prithvi 6/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationPlate Notation: A Slacker’s Day Planner0.3 0.1 0.1 0.10.4upbeatafty even.moodnightmood: upbeat, bored, sadactivities: go to sleep, watch TV, go to pub, go to beach, go bowlingma3Dnodes edges platesrandom variables dependencies repetitionsTopic Models, (Generative Clustering Models) –roman, prithvi 7/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationUnigram Model and Mixture of UnigramsNwNMzwUnigram Model Mixture of UnigramsDisadvantages:IDoes not model documents dealing with a mixture of topics.Mixture of Unigrams:IAlso known as, naive bayes model [McCallum and Nigam, 1998]IGenerative single class classification modelTopic Models, (Generative Clustering Models) –roman, prithvi 8/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationPLSI: Mixture Mod el for Text [Hofmann, 1999]NMwzdAdvantage:IFirst mixture model for documentsDisadvantage:IMixture parameters for each document, too many parametersIPoor generalization propertiesTopic Models, (Generative Clustering Models) –roman, prithvi 9/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationProblems with PLSI2-D simplex showing the space of document mixtures for 3 topics***************************************PLSI**** ***************** ******************LDATopic Models, (Generative Clustering Models) –roman, prithvi 10/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationLatent Dirichlet Allocation [Blei et al, 2003]TNMwzαβφθGenerative process:IChoose θ ∼ Dir(α)IFor each word in doc:IChoose topic z ∼ mult(θ)IChoose word w ∼ mult(φz)M # of DocumentsN # of WordsT # of Topicsw Generated wordz Topic of word wθ Distribution of topicsφzDistribution of words given topic zα Dirichlet parameterβ Dirichlet parameterTopic Models, (Generative Clustering Models) –roman, prithvi 11/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationDiscriminative vs. GenerativeWord topicsarts budget educationnew million schoolfilm tax studentsshow program schoolsmusic budget educationmovie billion teachersplay federal highmusical year publicbest spending teacher.........Document mixturesIθ29795: ..... wanted to play jazz....Iθ1883: .... play ... performed ...stage ....Iθ21359: ..... don and jim play thegame ....IThe θ’s estimated for eachdocument can b e used as a lowdim. rep. for the doc., can beused to classify the docs.Topic Models, (Generative Clustering Models) –roman, prithvi 12/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationGibbs Sampling for LDA [Griffiths and Steyvers, 2004]P(zi= j|z−j, w) =nwi−i,j+ βPwinwi−i,j+ W β| {z }prob. of wiunder topic jprob. of ziin doc containing wiz }| {ndi−i,j+ αPjndi−i,j+ T αIPerform burn-inIRun iterations of the Gibbs sampler collecting samples after regular intervalsIFor each iteration:IFor word wiin corpus, sample zifrom P(zi= j|z−i, w)IStraightforward to recover θ’s and φ’s after Gibbs sampler has convergedTopic Models, (Generative Clustering Models) –roman, prithvi 13/48IntroductionTopic ModelsExtensions and ApplicationsPlate NotationEarlier To pic ModelsLatent Dirichlet AllocationAbout LDA and Gibbs SamplingWhy dirichlet?IConjugate prior of multinomial. Lets you
View Full Document