DOC PREVIEW
CMU CS 10701 - Unsupervised Learning Clustering K-means

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

12005-2007 Carlos Guestrin1Unsupervised LearningClusteringK-meansMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityNovember 18th, 2009©2005-2009 Carlos Guestrin1Some Data©2005-2009 Carlos Guestrin22K-means1. Ask user how many clusters they’d like. (e.g. k=5) ©2005-2009 Carlos Guestrin3K-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations©2005-2009 Carlos Guestrin43K-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations3. Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints)©2005-2009 Carlos Guestrin5K-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations3. Each datapoint finds out which Center it’s closest to.4. Each Center finds the centroid of the points it owns©2005-2009 Carlos Guestrin64K-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations3. Each datapoint finds out which Center it’s closest to.4. Each Center finds the centroid of the points it owns…5. …and jumps there6. …Repeat until terminated!©2005-2009 Carlos Guestrin7K-means Randomly initialize k centers µ(0)= µ1(0),…, µk(0) Classify: Assign each point j∈∈∈∈{1,…m} to nearest center: Recenter: µibecomes centroid of its point: Equivalent to µi←←←← average of its points!©2005-2009 Carlos Guestrin85What is K-means optimizing?  Potential function F(µµµµ,C) of centers µµµµ and point allocations C: Optimal K-means: minµµµµminCF(µµµµ,C) ©2005-2009 Carlos Guestrin9Does K-means converge??? Part 1 Optimize potential function: Fix µµµµ, optimize C©2005-2009 Carlos Guestrin106Does K-means converge??? Part 2 Optimize potential function: Fix C, optimize µµµµ©2005-2009 Carlos Guestrin11Coordinate descent algorithms Want: minaminbF(a,b) Coordinate descent: fix a, minimize b fix b, minimize a repeat Converges!!! if F is bounded to a (often good) local optimum  as we saw in applet (play with it!) K-means is a coordinate descent algorithm!©2005-2009 Carlos Guestrin1272005-2007 Carlos GuestrinEMMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityNovember 18th, 2009©2005-2009 Carlos Guestrin1314(One) bad case for k-means Clusters may overlap Some clusters may be “wider” than others©2005-2009 Carlos Guestrin815Gaussian Bayes Classifier Reminder)()()|()|(jjjpiyPiypiyPxxx====P(y = i | xj) ∝1(2π)m / 2|| Σi||1/ 2exp −12xj−µi( )TΣi−1xj−µi( )      P(y = i)©2005-2009 Carlos Guestrin16Predicting wealth from age©2005-2009 Carlos Guestrin917Predicting wealth from age©2005-2009 Carlos Guestrin18Learning modelyear , mpg ---> maker Σ =σ21σ12Lσ1mσ12σ22Lσ2mM M O Mσ1mσ2mLσ2m            ©2005-2009 Carlos Guestrin1019General: O(m2)parameters Σ =σ21σ12Lσ1mσ12σ22Lσ2mM M O Mσ1mσ2mLσ2m            ©2005-2009 Carlos Guestrin20Aligned: O(m)parameters Σ =σ210 0 L 0 00σ220 L 0 00 0σ23L 0 0M M M O M M0 0 0 Lσ2m −100 0 0 L 0σ2m                  ©2005-2009 Carlos Guestrin1121Aligned: O(m)parameters Σ =σ210 0 L 0 00σ220 L 0 00 0σ23L 0 0M M M O M M0 0 0 Lσ2m −100 0 0 L 0σ2m                  ©2005-2009 Carlos Guestrin22Spherical: O(1)cov parameters Σ =σ20 0 L 0 00σ20 L 0 00 0σ2L 0 0M M M O M M0 0 0 Lσ200 0 0 L 0σ2                  ©2005-2009 Carlos Guestrin1223Spherical: O(1)cov parameters Σ =σ20 0 L 0 00σ20 L 0 00 0σ2L 0 0M M M O M M0 0 0 Lσ200 0 0 L 0σ2                  ©2005-2009 Carlos Guestrin24Next… back to Density EstimationWhat if we want to do density estimation with multimodal or clumpy data?©2005-2009 Carlos Guestrin1325But we don’t see class labels!!! MLE: argmax ∏jP(yj,xj) But we don’t know yj’s!!! Maximize marginal likelihood: argmax ∏jP(xj) = argmax ∏j∑i=1kP(yj=i,xj)©2005-2009 Carlos Guestrin26Special case: spherical Gaussians and hard assignments If P(X|Y=i) is spherical, with same σ for all classes: If each xjbelongs to one class C(j) (hard assignment), marginal likelihood: Same as K-means!!!P(xj| y = i) ∝ exp −12σ2xj−µi2      P(xj, y = i)i=1k∑j=1m∏∝ exp −12σ2xj−µC ( j )2      j =1m∏P(y = i | xj) ∝1(2π)m / 2|| Σi||1/ 2exp −12xj−µi( )TΣi−1xj−µi( )      P(y = i)©2005-2009 Carlos Guestrin1427The GMM assumption• There are k components• Component i has an associated mean vector µιµ1µ2µ3©2005-2009 Carlos Guestrin28The GMM assumption• There are k components• Component i has an associated mean vector µι Each component generates data from a Gaussian with mean mi and covariance matrixσ2ΙΙΙΙEach data point is generated according to the following recipe: µ1µ2µ3©2005-2009 Carlos Guestrin1529The GMM assumption• There are k components• Component i has an associated mean vector µι• Each component generates data from a Gaussian with mean mi and covariance matrix σ2ΙΙΙΙEach data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i)µ2©2005-2009 Carlos Guestrin30The GMM assumption• There are k components• Component i has an associated mean vector µι• Each component generates data from a Gaussian with mean mi and covariance matrix σ2ΙΙΙΙEach data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i)2. Datapoint ∼ Ν(µι, σ2ΙΙΙΙ)µ2x©2005-2009 Carlos Guestrin1631The General GMM assumptionµ1µ2µ3• There are k components• Component i has an associated mean vector mi• Each component generates data from a Gaussian with mean mi and covariance matrix ΣiEach data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i)2. Datapoint ~ N(mi, Σi)©2005-2009 Carlos Guestrin32Unsupervised Learning:not as hard as it looksSometimes


View Full Document

CMU CS 10701 - Unsupervised Learning Clustering K-means

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Unsupervised Learning Clustering K-means
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Unsupervised Learning Clustering K-means and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Unsupervised Learning Clustering K-means 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?