DOC PREVIEW
CMU CS 10708 - Recitation

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Dirichlet Process Mixtures A gentle tutorialMotivationSlide 3What to do?Review: Dirichlet DistributionDirichlet ProcessSlide 7Slide 8Slide 9Sampling from a DPSlide 11Slide 12Chinese Restaurant ProcessSlide 14Dirichlet Process MixtureCRP MixtureStick BreakingSlide 18Formal DefinitionInference in a DPMGibbs Sampling [Neal 2000]Gibbs Sampling [WAS 22-DAL 19]Slide 23Slide 24ConclusionAcknowledgmentsReferences1Dirichlet Process Mixtures A gentle tutorialGraphical Models – 10708Khalid El-AriniCarnegie Mellon UniversityNovember 6th, 200610-7082 We are given a data set, and are told that it was generated from a mixture of Gaussians.Unfortunately, no one has any idea how many Gaussians produced the data.Motivation10-7083 We are given a data set, and are told that it was generated from a mixture of Gaussians.Unfortunately, no one has any idea how many Gaussians produced the data.Motivation10-7084 What to do?We can guess the number of clusters, do EM for Gaussian Mixture Models, look at the results, and then try again…We can do hierarchical agglomerative clustering, and cut the tree at a visually appealing level…We want to cluster the data in a statistically principled manner, without resorting to hacks.10-7085 Review: Dirichlet DistributionLet We write:Distribution over possible parameter vectors for a multinomial distribution, and is in fact the conjugate prior for the multinomial.Beta distribution is the special case of a Dirichlet for 2 dimensions.Samples from the distribution lie in the m-1 dimensional simplexThus, it is in fact a “distribution over distributions.”10-7086 Dirichlet ProcessA Dirichlet Process is also a distribution over distributions.We write:G ~ DP(α, G0)G0 is a base distributionα is a positive scaling parameterG has the same support as G010-7087 Dirichlet ProcessConsider Gaussian G0G ~ DP(α, G0)10-7088 Dirichlet ProcessG ~ DP(α, G0)G0 is continuous, so the probability that any two samples are equal is precisely zero.However, G is a discrete distribution, made up of a countably infinite number of point masses [Blackwell]Therefore, there is always a non-zero probability of two samples colliding10-7089 Dirichlet ProcessG ~ DP(α1, G0)G ~ DP(α2, G0)α values determine how closeG is to G010-70810 Sampling from a DPG ~ DP(α, G0)Xn | G ~ G for n = {1, …, N} (iid)Marginalizing out G introduces dependenciesbetween the Xn variablesGXnN10-70811 Sampling from a DPAssume we view these variables in a specific order, and are interested in the behavior of Xn given the previous n - 1 observations.Let there be K unique values for the variables:10-70812 Sampling from a DP Notice that the above formulation of the joint does not depend on the order we consider the variables. We can arrive at a mixture model by assuming exchangeability and applying DeFinetti’s Theorem (1935).Chain ruleP(partition) P(draws)10-70813 Chinese Restaurant ProcessCan rewrite as:Let there be K unique values for the variables:10-70814 Chinese Restaurant ProcessConsider a restaurant with infinitely many tables, where the Xn’s represent the patrons of the restaurant. From the above conditional probability distribution, we can see that a customer is more likely to sit at a table if there are already many people sitting there. However, with probability proportional to α, the customer will sit at a new table.Also known as the “clustering effect,” and can be seen in the setting of social clubs. [Aldous]10-70815 Dirichlet Process MixtureGηnNynG0αcountably infinite number of point massesdraw N times from G to get parameters for different mixture componentsIf ηn were drawn from e.g. a Gaussian, no two values would be the same, but since they are drawn from a distribution drawn from a Dirichlet Process, we expect a clustering of the ηn# unique values for ηn = # mixture components10-70816 CRP Mixture10-70817 Stick BreakingSo far, we’ve just mentioned properties of a distribution G drawn from a Dirichlet ProcessIn 1994, Sethuraman developed a constructive way of forming G, known as “stick breaking”10-70818 Stick Breaking1. Draw η1* from G0 2. Draw v1 from Beta(1, α)4. Draw η2* from G0 3. π1 = v1… 5. Draw v2 from Beta(1, α)6. π2 = v2(1 – v1)10-70819 Formal DefinitionLet α be a positive, real-valued scalarLet G0 be a non-atomic probability distribution over support set AWe say G ~ DP(α, G0), if for all natural numbers k and k-partitions {A1, …, Ak},10-70820 Inference in a DPMEM is generally used for inference in a mixture model, but G is nonparametric, making EM difficultMarkov Chain Monte Carlo techniques [Neal 2000]Variational Inference [Blei and Jordan 2006]GηnNynG0α10-70821 Gibbs Sampling [Neal 2000]Algorithm 1:Define Hi to be the single observation posteriorWe marginalize out G from our model, and sample each ηn given everything elseGηnNynG0αSLOW TO CONVERGE!10-70822 Gibbs Sampling [WAS 22-DAL 19]10-70823 Gibbs Sampling [Neal 2000]Algorithm 2:GηnNynG0αcnNynG0αηc∞[Grenager 2005]10-70824 Gibbs Sampling [Neal 2000]Algorithm 2 (cont.):We sample from the distribution over an individual cluster assignment cn given yn, and all the other cluster assignments1. Initialize cluster assignments c1, …, cN2. For i=1,…,N, draw ci from:3. For all c, draw ηc | yi (for all i such that ci = c)if c = cj for some j ≠ iotherwise10-70825 We now have a statistically principled mechanism for solving our original problem.This was intended as a general and fairly shallow overview of Dirichlet Processes.Conclusion10-70826 AcknowledgmentsMuch thanks goes to David Blei.Some material for this presentation was inspired by slides from Teg Grenager and Zoubin Ghahramani.10-70827 ReferencesBlei, David M. and Michael I. Jordan. “Variational inference for Dirichlet process mixtures.” Bayesian Analysis 1(1), 2006.R.M. Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9:249-265, 2000.Ghahramani, Zoubin. “Non-parametric Bayesian Methods.” UAI Tutorial July 2005.Grenager, Teg. “Chinese Restaurants and Stick Breaking: An Introduction to the Dirichlet Process”Blackwell, David and James B. MacQueen. “Ferguson Distributions via Polya Urn Schemes.” The Annals of Statistics 1(2), 1973, 353-355.Ferguson, Thomas S. “A Bayesian Analysis of Some Nonparametric Problems” The


View Full Document

CMU CS 10708 - Recitation

Documents in this Course
Lecture

Lecture

15 pages

Lecture

Lecture

25 pages

Lecture

Lecture

24 pages

causality

causality

53 pages

lecture11

lecture11

16 pages

Exam

Exam

15 pages

Notes

Notes

12 pages

lecture

lecture

18 pages

lecture

lecture

16 pages

Lecture

Lecture

17 pages

Lecture

Lecture

15 pages

Lecture

Lecture

17 pages

Lecture

Lecture

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

16 pages

r6

r6

22 pages

lecture

lecture

20 pages

lecture

lecture

35 pages

Lecture

Lecture

19 pages

Lecture

Lecture

21 pages

lecture

lecture

21 pages

lecture

lecture

13 pages

review

review

50 pages

Semantics

Semantics

30 pages

lecture21

lecture21

26 pages

MN-crf

MN-crf

20 pages

hw4

hw4

5 pages

lecture

lecture

12 pages

Lecture

Lecture

25 pages

Lecture

Lecture

25 pages

Lecture

Lecture

14 pages

Lecture

Lecture

15 pages

Load more
Download Recitation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Recitation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Recitation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?