Hierarchical Topic Models and the Nested Chinese Restaurant Process

Unformatted text preview:

Hierarchical Topic Models and the Nested Chinese Restaurant Process David M Blei blei cs berkeley edu Thomas L Griffiths gruffydd mit edu Michael I Jordan jordan cs berkeley edu Joshua B Tenenbaum jbt mit edu University of California Berkeley Berkeley CA 94720 Massachusetts Institute of Technology Cambridge MA 02139 Abstract We address the problem of learning topic hierarchies from data The model selection problem in this domain is daunting which of the large collection of possible trees to use We take a Bayesian approach generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts 1 Introduction Complex probabilistic models are increasingly prevalent in domains such as bioinformatics information retrieval and vision These domains create fundamental modeling challenges due to their open ended nature data sets often grow over time and as they grow they bring new entities and new structures to the fore Current statistical modeling tools often seem too rigid in this regard in particular classical model selection techniques based on hypothesis testing are poorly matched to problems in which data can continue to accrue and unbounded sets of often incommensurate structures must be considered at each step An important instance of such modeling challenges is provided by the problem of learning a topic hierarchy from data Given a collection of documents each of which contains a set of words we wish to discover common usage patterns or topics in the documents and to organize these topics into a hierarchy Note that while we use the terminology of document modeling throughout this paper the methods that we describe are general In this paper we develop efficient statistical methods for constructing such a hierarchy which allow it to grow and change as the data accumulate We approach this model selection problem by specifying a generative probabilistic model for hierarchical structures and taking a Bayesian perspective on the problem of learning these structures from data Thus our hierarchies are random variables moreover these random variables are specified procedurally according to an algorithm that constructs the hierarchy as data are made available The probabilistic object that underlies this approach is a distribution on partitions of integers known as the Chinese restaurant process 1 We show how to extend the Chinese restaurant process to a hierarchy of partitions and show how to use this new process as a representation of prior and posterior distributions for topic hierarchies There are several possible approaches to the modeling of topic hierarchies In our approach each node in the hierarchy is associated with a topic where a topic is a distribution across words A document is generated by choosing a path from the root to a leaf repeatedly sampling topics along that path and sampling the words from the selected topics Thus the organization of topics into a hierarchy aims to capture the breadth of usage of topics across the corpus reflecting underlying syntactic and semantic notions of generality and specificity This approach differs from models of topic hierarchies which are built on the premise that the distributions associated with parents and children are similar 2 We assume no such constraint for example the root node may place all of its probability mass on function words with none of its descendants placing any probability mass on function words Our model more closely resembles the hierarchical topic model considered in 3 though that work does not address the model selection problem which is our primary focus 2 Chinese restaurant processes We begin with a brief description of the Chinese restaurant process and subsequently show how this process can be extended to hierarchies 2 1 The Chinese restaurant process The Chinese restaurant process CRP is a distribution on partitions of integers obtained by imagining a process by which M customers sit down in a Chinese restaurant with an infinite number of tables 1 The basic process is specified as follows The first customer sits at the first table The mth subsequent customer sits at a table drawn from the following distribution p occupied table i previous customers p next unoccupied table previous customers mi m 1 m 1 1 where mi is the number of previous customers at table i and is a parameter After M customers sit down the seating plan gives a partition of M items This distribution gives the same partition structure as draws from a Dirichlet process 4 However the CRP also allows several variations on the basic rule in Eq 1 including a data dependent choice of and a more general functional dependence on the current partition 5 This flexibility will prove useful in our setting The CRP has been used to represent uncertainty over the number of components in a mixture model In a species sampling mixture 6 each table in the Chinese restaurant is associated with a draw from p where is a mixture component parameter Each data point is generated by choosing a table i from Eq 1 and then sampling a value from the distribution parameterized by i the parameter associated with that table Given a data set the posterior under this model has two components First it is a distribution over seating plans the number of mixture components is determined by the number of tables which the data occupy Second given a seating plan the particular data which are sitting at each table induce a distribution on the associated parameter for that table The posterior can be estimated using Markov chain Monte Carlo 7 Applications to various kinds of mixture models have begun to appear in recent years examples include Gaussian mixture models 8 hidden Markov models 9 and mixtures of experts 10 1 The terminology was inspired by the Chinese restaurants in San Francisco which seem to have an infinite seating capacity It was coined by Jim Pitman and Lester Dubins in the early eighties 1 2 2 Extending the CRP to hierarchies The CRP is amenable to mixture modeling because we can establish a one to one relationship between tables and mixture components and a


Hierarchical Topic Models and the Nested Chinese Restaurant Process

Loading Unlocking...
Login

Join to view Hierarchical Topic Models and the Nested Chinese Restaurant Process and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hierarchical Topic Models and the Nested Chinese Restaurant Process and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?