CMU CS 10701 - Unsupervised Learning Clustering K-means - D2166389

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> Unsupervised Learning Clustering K-means

DOC PREVIEW

CMU CS 10701 - Unsupervised Learning Clustering K-means

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Unsupervised Learning Clustering K means Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University 2005 2009 Carlos Guestrin November 18th 2009 2005 2007 Carlos Guestrin 1 1 Some Data 2005 2009 Carlos Guestrin 2 1 K means 1 Ask user how many clusters they d like e g k 5 2005 2009 Carlos Guestrin 3 K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 2005 2009 Carlos Guestrin 4 2 K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 3 Each datapoint finds out which Center it s closest to Thus each Center owns a set of datapoints 2005 2009 Carlos Guestrin 5 K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 3 Each datapoint finds out which Center it s closest to 4 Each Center finds the centroid of the points it owns 2005 2009 Carlos Guestrin 6 3 K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 3 Each datapoint finds out which Center it s closest to 4 Each Center finds the centroid of the points it owns 5 and jumps there 6 Repeat until terminated 2005 2009 Carlos Guestrin 7 K means Randomly initialize k centers 0 1 0 k 0 Classify Assign each point j 1 m to nearest center Recenter i becomes centroid of its point Equivalent to 2005 2009 Carlos Guestrin i average of its points 8 4 What is K means optimizing Potential function F C of centers and point allocations C Optimal K means min minC F C 9 2005 2009 Carlos Guestrin Does K means converge Part 1 Optimize potential function Fix optimize C 2005 2009 Carlos Guestrin 10 5 Does K means converge Part 2 Optimize potential function Fix C optimize 2005 2009 Carlos Guestrin 11 Coordinate descent algorithms Want mina minb F a b Coordinate descent fix a minimize b fix b minimize a repeat Converges if F is bounded to a often good local optimum as we saw in applet play with it K means is a coordinate descent algorithm 2005 2009 Carlos Guestrin 12 6 EM Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University 2005 2009 Carlos Guestrin November 18th 2009 13 2005 2007 Carlos Guestrin One bad case for k means 2005 2009 Carlos Guestrin Clusters may overlap Some clusters may be wider than others 14 7 Gaussian Bayes Classifier Reminder P y i x j P y i x j p x j y i P y i p x j 2 T 1 1 exp x j i 1 i x j i P y i 1 2 2 i m 2 2005 2009 Carlos Guestrin 15 Predicting wealth from age 2005 2009 Carlos Guestrin 16 8 Predicting wealth from age 17 2005 2009 Carlos Guestrin Learning modelyear mpg maker 2005 2009 Carlos Guestrin 21 12 M 1m 12 L 1m 2 2 L 2m M 2m M L 2m O 18 9 General O m2 parameters 21 12 M 1m 12 L 1m 2 2 L 2m M 2m 19 2005 2009 Carlos Guestrin Aligned O m parameters 2005 2009 Carlos Guestrin M L 2m O 21 0 2 0 2 0 0 M M 0 0 0 0 0 0 23 M 0 0 L 0 0 L 0 0 L 0 0 O M M L 2 m 1 0 L 0 2m 20 10 Aligned O m parameters 21 0 2 0 2 0 0 M M 0 0 0 0 0 0 23 M 0 0 L 0 0 L 0 0 L 0 0 O M M 0 L 2 m 1 L 0 2m 21 2005 2009 Carlos Guestrin Spherical O 1 cov parameters 2005 2009 Carlos Guestrin 2 0 0 M 0 0 0 2 0 0 L 0 0 L 0 2 L 0 M M O 0 0 L 2 0 0 L 0 0 0 M 0 2 M 0 22 11 Spherical O 1 cov parameters 2 0 0 M 0 0 0 2 0 0 L 0 0 L 0 2 L 0 M M O 0 0 L 2 0 0 L 2005 2009 Carlos Guestrin 0 0 0 M 0 2 M 0 23 Next back to Density Estimation What if we want to do density estimation with multimodal or clumpy data 2005 2009 Carlos Guestrin 24 12 But we don t see class labels MLE argmax j P yj xj But we don t know yj s Maximize marginal likelihood argmax j P xj argmax j i 1k P yj i xj 25 2005 2009 Carlos Guestrin Special case spherical Gaussians and hard assignments P y i x j 2 If P X Y i is spherical with same for all classes 1 2 P x j y i exp 2 x j i 2 If each xj belongs to one class C j hard assignment marginal likelihood m k P x j 1 i 1 T 1 1 exp x j i 1 i x j i P y i 1 2 2 i m 2 j m 2 1 y i exp 2 x j C j 2 j 1 Same as K means 2005 2009 Carlos Guestrin 26 13 The GMM assumption There are k components Component i has an associated mean vector 2 1 3 27 2005 2009 Carlos Guestrin The GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix 2 Each data point is generated according to the following recipe 2005 2009 Carlos Guestrin 2 1 3 28 14 The GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix 2 2 Each data point is generated according to the following recipe 1 Pick a component at random Choose component i with probability P y i 29 2005 2009 Carlos Guestrin The GMM assumption There are k components Component i has an associated mean vector Each component generates data from a Gaussian with mean mi and covariance matrix 2 x 2 Each data point is generated according to the following recipe 1 Pick a component at random Choose component i with probability P y i 2 Datapoint 2 2005 2009 Carlos Guestrin 30 15 The General GMM assumption There are k components Component i has an associated mean vector mi Each component generates data from a Gaussian with mean mi and covariance matrix 2 1 i 3 Each data point is generated according to the following recipe 1 Pick a component at random Choose component i with probability P y i 2 Datapoint N mi i 31 2005 2009 Carlos Guestrin Unsupervised Learning not as hard as it looks Sometimes easy Sometimes impossible IN CASE YOU RE WONDERING WHAT THESE DIAGRAMS ARE THEY SHOW 2 d …

View Full Document

CMU CS 10701 - Unsupervised Learning Clustering K-means

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-21-22 out of 22 pages.

CMU CS 10701 - Unsupervised Learning Clustering K-means

Sign up for free to view:

Please select your school