CMU CS 10701 - Unsupervised Learning Clustering K-means - D2166389

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> Unsupervised Learning Clustering K-means

DOC PREVIEW

CMU CS 10701 - Unsupervised Learning Clustering K-means

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

12005-2007 Carlos Guestrin1Unsupervised LearningClusteringK-meansMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityNovember 18th, 2009©2005-2009 Carlos Guestrin1Some Data©2005-2009 Carlos Guestrin22K-means1. Ask user how many clusters they’d like. (e.g. k=5) ©2005-2009 Carlos Guestrin3K-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations©2005-2009 Carlos Guestrin43K-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations3. Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints)©2005-2009 Carlos Guestrin5K-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations3. Each datapoint finds out which Center it’s closest to.4. Each Center finds the centroid of the points it owns©2005-2009 Carlos Guestrin64K-means1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations3. Each datapoint finds out which Center it’s closest to.4. Each Center finds the centroid of the points it owns…5. …and jumps there6. …Repeat until terminated!©2005-2009 Carlos Guestrin7K-means Randomly initialize k centers µ(0)= µ1(0),…, µk(0) Classify: Assign each point j∈∈∈∈{1,…m} to nearest center: Recenter: µibecomes centroid of its point: Equivalent to µi←←←← average of its points!©2005-2009 Carlos Guestrin85What is K-means optimizing?  Potential function F(µµµµ,C) of centers µµµµ and point allocations C: Optimal K-means: minµµµµminCF(µµµµ,C) ©2005-2009 Carlos Guestrin9Does K-means converge??? Part 1 Optimize potential function: Fix µµµµ, optimize C©2005-2009 Carlos Guestrin106Does K-means converge??? Part 2 Optimize potential function: Fix C, optimize µµµµ©2005-2009 Carlos Guestrin11Coordinate descent algorithms Want: minaminbF(a,b) Coordinate descent: fix a, minimize b fix b, minimize a repeat Converges!!! if F is bounded to a (often good) local optimum  as we saw in applet (play with it!) K-means is a coordinate descent algorithm!©2005-2009 Carlos Guestrin1272005-2007 Carlos GuestrinEMMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityNovember 18th, 2009©2005-2009 Carlos Guestrin1314(One) bad case for k-means Clusters may overlap Some clusters may be “wider” than others©2005-2009 Carlos Guestrin815Gaussian Bayes Classifier Reminder)()()|()|(jjjpiyPiypiyPxxx====P(y = i | xj) ∝1(2π)m / 2|| Σi||1/ 2exp −12xj−µi( )TΣi−1xj−µi( )      P(y = i)©2005-2009 Carlos Guestrin16Predicting wealth from age©2005-2009 Carlos Guestrin917Predicting wealth from age©2005-2009 Carlos Guestrin18Learning modelyear , mpg ---> maker Σ =σ21σ12Lσ1mσ12σ22Lσ2mM M O Mσ1mσ2mLσ2m            ©2005-2009 Carlos Guestrin1019General: O(m2)parameters Σ =σ21σ12Lσ1mσ12σ22Lσ2mM M O Mσ1mσ2mLσ2m            ©2005-2009 Carlos Guestrin20Aligned: O(m)parameters Σ =σ210 0 L 0 00σ220 L 0 00 0σ23L 0 0M M M O M M0 0 0 Lσ2m −100 0 0 L 0σ2m                  ©2005-2009 Carlos Guestrin1121Aligned: O(m)parameters Σ =σ210 0 L 0 00σ220 L 0 00 0σ23L 0 0M M M O M M0 0 0 Lσ2m −100 0 0 L 0σ2m                  ©2005-2009 Carlos Guestrin22Spherical: O(1)cov parameters Σ =σ20 0 L 0 00σ20 L 0 00 0σ2L 0 0M M M O M M0 0 0 Lσ200 0 0 L 0σ2                  ©2005-2009 Carlos Guestrin1223Spherical: O(1)cov parameters Σ =σ20 0 L 0 00σ20 L 0 00 0σ2L 0 0M M M O M M0 0 0 Lσ200 0 0 L 0σ2                  ©2005-2009 Carlos Guestrin24Next… back to Density EstimationWhat if we want to do density estimation with multimodal or clumpy data?©2005-2009 Carlos Guestrin1325But we don’t see class labels!!! MLE: argmax ∏jP(yj,xj) But we don’t know yj’s!!! Maximize marginal likelihood: argmax ∏jP(xj) = argmax ∏j∑i=1kP(yj=i,xj)©2005-2009 Carlos Guestrin26Special case: spherical Gaussians and hard assignments If P(X|Y=i) is spherical, with same σ for all classes: If each xjbelongs to one class C(j) (hard assignment), marginal likelihood: Same as K-means!!!P(xj| y = i) ∝ exp −12σ2xj−µi2      P(xj, y = i)i=1k∑j=1m∏∝ exp −12σ2xj−µC ( j )2      j =1m∏P(y = i | xj) ∝1(2π)m / 2|| Σi||1/ 2exp −12xj−µi( )TΣi−1xj−µi( )      P(y = i)©2005-2009 Carlos Guestrin1427The GMM assumption• There are k components• Component i has an associated mean vector µιµ1µ2µ3©2005-2009 Carlos Guestrin28The GMM assumption• There are k components• Component i has an associated mean vector µι Each component generates data from a Gaussian with mean mi and covariance matrixσ2ΙΙΙΙEach data point is generated according to the following recipe: µ1µ2µ3©2005-2009 Carlos Guestrin1529The GMM assumption• There are k components• Component i has an associated mean vector µι• Each component generates data from a Gaussian with mean mi and covariance matrix σ2ΙΙΙΙEach data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i)µ2©2005-2009 Carlos Guestrin30The GMM assumption• There are k components• Component i has an associated mean vector µι• Each component generates data from a Gaussian with mean mi and covariance matrix σ2ΙΙΙΙEach data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i)2. Datapoint ∼ Ν(µι, σ2ΙΙΙΙ)µ2x©2005-2009 Carlos Guestrin1631The General GMM assumptionµ1µ2µ3• There are k components• Component i has an associated mean vector mi• Each component generates data from a Gaussian with mean mi and covariance matrix ΣiEach data point is generated according to the following recipe: 1. Pick a component at random: Choose component i with probability P(y=i)2. Datapoint ~ N(mi, Σi)©2005-2009 Carlos Guestrin32Unsupervised Learning:not as hard as it looksSometimes

View Full Document

CMU CS 10701 - Unsupervised Learning Clustering K-means

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-21-22 out of 22 pages.

CMU CS 10701 - Unsupervised Learning Clustering K-means

Sign up for free to view:

Please select your school