Hierarchical Topic Models and the Nested Chinese Restaurant Process

Home> Academic Documents> Hierarchical Topic Models and the Nested Chinese Restaurant Process

DOC PREVIEW

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Hierarchical Topic Models andthe Nested Chinese Restaurant ProcessDavid M. Blei Thomas L. [email protected] [email protected] I. Jordan Joshua B. [email protected] [email protected] of California, Berkeley Massachusetts Institute of TechnologyBerkeley, CA 94720 Cambridge, MA 02139AbstractWe address the problem of learning topic hierarchies from data. Themodel selection problem in this domain is daunting—which of the largecollection of possible trees to use? We take a Bayesian approach, gen-erating an appropriate prior via a distribution on partitions that we referto as the nested Chinese restaurant process. This nonparametric prior al-lows arbitrarily large branching factors and readily accommodates grow-ing data collections. We build a hierarchical topic model by combiningthis prior with a likelihood that is based on a hierarchical variant of latentDirichlet allocation. We illustrate our approach on simulated data andwith an application to the modeling of NIPS abstracts.1 IntroductionComplex probabilistic models are increasingly prevalent in domains such as bioinformat-ics, information retrieval, and vision. These domains create fundamental modeling chal-lenges due to their open-ended nature—data sets often grow over time, and as they growthey bring new entities and new structures to the fore. Current statistical modeling toolsoften seem too rigid in this regard; in particular, classical model selection techniques basedon hypothesis testing are poorly matched to problems in which data can continue to accrueand unbounded sets of often incommensurate structures must be considered at each step.An important instance of such modeling challenges is provided by the problem of learninga topic hierarchy from data. Given a collection of “documents,” each of which contains aset of “words,” we wish to discover common usage patterns or “topics” in the documents,and to organize these topics into a hierarchy. (Note that while we use the terminology ofdocument modeling throughout this paper, the methods that we describe are general.) Inthis paper, we develop efficient statistical methods for constructing such a hierarchy whichallow it to grow and change as the data accumulate.We approach this model selection problem by specifying a generative probabilistic modelfor hierarchical structures and taking a Bayesian perspective on the problem of learningthese structures from data. Thus our hierarchies are random variables; moreover, theserandom variables are specified procedurally, according to an algorithm that constructs thehierarchy as data are made available. The probabilistic object that underlies this approachis a distribution on partitions of integers known as the Chinese restaurant process [1]. Weshow how to extend the Chinese restaurant process to a hierarchy of partitions, and showhow to use this new process as a representation of prior and posterior distributions for topichierarchies.There are several possible approaches to the modeling of topic hierarchies. In our approach,each node in the hierarchy is associated with a topic, where a topic is a distribution acrosswords. A document is generated by choosing a path from the root to a leaf, repeatedlysampling topics along that path, and sampling the words from the selected topics. Thusthe organization of topics into a hierarchy aims to capture the breadth of usage of topicsacross the corpus, reflecting underlying syntactic and semantic notions of generality andspecificity. This approach differs from models of topic hierarchies which are built on thepremise that the distributions associated with parents and children are similar [2]. Weassume no such constraint—for example, the root node may place all of its probability masson function words, with none of its descendants placing any probability mass on functionwords. Our model more closely resembles the hierarchical topic model considered in [3],though that work does not address the model selection problem which is our primary focus.2 Chinese restaurant processesWe begin with a brief description of the Chinese restaurant process and subsequently showhow this process can be extended to hierarchies.2.1 The Chinese restaurant processThe Chinese restaurant process (CRP) is a distribution on partitions of integers obtainedby imagining a process by which M customers sit down in a Chinese restaurant with aninfinite number of tables.1The basic process is specified as follows. The first customer sitsat the first table. The mth subsequent customer sits at a table drawn from the followingdistribution:p(occupied table i | previous customers) =miγ+m−1p(next unoccupied table| previous customers) =γγ+m−1(1)where miis the number of previous customers at table i, and γ is a parameter. After Mcustomers sit down, the seating plan gives a partition of M items. This distribution givesthe same partition structure as draws from a Dirichlet process [4]. However, the CRP alsoallows several variations on the basic rule in Eq. (1), including a data-dependent choice ofγ and a more general functional dependence on the current partition [5]. This flexibilitywill prove useful in our setting.The CRP has been used to represent uncertainty over the number of components in a mix-ture model. In a species sampling mixture [6], each table in the Chinese restaurant isassociated with a draw from p(β | η) where β is a mixture component parameter. Eachdata point is generated by choosing a table i from Eq. (1) and then sampling a value fromthe distribution parameterized by βi(the parameter associated with that table). Given adata set, the posterior under this model has two components. First, it is a distribution overseating plans; the number of mixture components is determined by the number of tableswhich the data occupy. Second, given a seating plan, the particular data which are sitting ateach table induce a distribution on the associated parameter β for that table. The posteriorcan be estimated using Markov chain Monte Carlo [7]. Applications to various kinds ofmixture models have begun to appear in recent years; examples include Gaussian mixturemodels [8], hidden Markov models [9] and mixtures of experts [10].1The terminology was inspired by the Chinese restaurants in San Francisco which seem to havean infinite seating capacity. It was coined by Jim Pitman and Lester Dubins in the early eighties [1].2.2 Extending the CRP to hierarchiesThe CRP is amenable to


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

Please select your school