Unsupervised learning or Clustering – K-means Gaussian mixture models - D682883

Home> Academic Documents> Unsupervised learning or Clustering – K-means Gaussian mixture models

DOC PREVIEW

Unsupervised learning or Clustering – K-means Gaussian mixture models

This preview shows page 1-2-3-4-31-32-33-34-35-64-65-66-67 out of 67 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 67 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Unsupervised learning or Clustering K means Gaussian mixture models Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University April 4th 2007 2005 2007 Carlos Guestrin Some Data 2005 2007 Carlos Guestrin K means 1 Ask user how many clusters they d like e g k 5 2005 2007 Carlos Guestrin K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 2005 2007 Carlos Guestrin K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 3 Each datapoint finds out which Center it s closest to Thus each Center owns a set of datapoints 2005 2007 Carlos Guestrin K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 3 Each datapoint finds out which Center it s closest to 4 Each Center finds the centroid of the points it owns 2005 2007 Carlos Guestrin K means 1 Ask user how many clusters they d like e g k 5 2 Randomly guess k cluster Center locations 3 Each datapoint finds out which Center it s closest to 4 Each Center finds the centroid of the points it owns 5 and jumps there 6 Repeat until terminated 2005 2007 Carlos Guestrin K means Randomly initialize k centers 0 1 0 k 0 Classify Assign each point j 1 m to nearest center Recenter i becomes centroid of its point Equivalent to i average of its points 2005 2007 Carlos Guestrin What is K means optimizing Potential function F C of centers and point allocations C Optimal K means min minC F C 2005 2007 Carlos Guestrin Does K means converge Part 1 Optimize potential function Fix optimize C 2005 2007 Carlos Guestrin Does K means converge Part 2 Optimize potential function Fix C optimize 2005 2007 Carlos Guestrin Coordinate descent algorithms Want mina minb F a b Coordinate descent fix a minimize b fix b minimize a repeat Converges if F is bounded to a often good local optimum as we saw in applet play with it K means is a coordinate descent algorithm 2005 2007 Carlos Guestrin One bad case for k means Clusters may overlap Some clusters may be wider than others 2005 2007 Carlos Guestrin Gaussian Bayes Classifier Reminder P y i x j p x j y i P y i p x j T 1 1 1 P y i x j exp x j i i x j i P y i m 2 1 2 2 2 i 2005 2007 Carlos Guestrin Predicting wealth from age 2005 2007 Carlos Guestrin Predicting wealth from age 2005 2007 Carlos Guestrin Learning modelyear mpg maker 2005 2007 Carlos Guestrin 21 12 M 1m 12 L 1m 2 2 L 2m M 2m M 2 L m O General parameters O m2 2005 2007 Carlos Guestrin 21 12 M 1m 12 L 1m 2 2 L 2m M 2m M 2 L m O Aligned O m parameters 21 0 2 0 2 0 0 M M 0 0 0 0 2005 2007 Carlos Guestrin 0 L 0 0 L 0 O 0 M 23 L M 0 0 L 2 m 1 L 0 0 0 0 M 0 2m Aligned O m parameters 21 0 2 0 2 0 0 M M 0 0 0 0 2005 2007 Carlos Guestrin 0 L 0 0 L 0 O 0 M 23 L M 0 0 L 2 m 1 L 0 0 0 0 M 0 2m Spherical O 1 cov parameters 2005 2007 Carlos Guestrin 2 0 0 M 0 0 0 2 0 M 0 0 0 0 2 M 0 0 L 0 L 0 L 0 O M L 2 L 0 2 0 0 0 M 0 Spherical O 1 cov parameters 2005 2007 Carlos Guestrin 2 0 0 M 0 0 0 2 0 M 0 0 0 0 2 M 0 0 L 0 L 0 L 0 O M L 2 L 0 2 0 0 0 M 0 Next back to Density Estimation What if we want to do density estimation with multimodal or clumpy data 2005 2007 Carlos Guestrin But we don t see class labels MLE argmax j P yj xj But we don t know yj s Maximize marginal likelihood argmax j P xj argmax j i 1k P yj i xj 2005 2007 Carlos Guestrin Special case spherical Gaussians and hard assignments P y i x j T 1 1 1 exp x x j i i j i P y i m 2 1 2 2 2 i If P X Y i is spherical with same for all classes 2 1 P x j y i exp 2 x j i 2 If each xj belongs to one class C j hard assignment marginal likelihood 2 1 P x j y i exp 2 2 x j C j j 1 i 1 j 1 m k m Same as K means 2005 2007 Carlos Guestrin The GMM assumption There are k components Component i has an associated mean vector i 2 1 3 2005 2007 Carlos Guestrin The GMM assumption There are k components Component i has an associated mean vector i Each component generates data from a Gaussian with mean i and covariance matrix 2I Each data point is generated according to the following recipe 2005 2007 Carlos Guestrin 2 1 3 The GMM assumption There are k components Component i has an associated mean vector i Each component generates data from a Gaussian with mean i and covariance matrix 2I Each data point is generated according to the following recipe 1 Pick a component at random Choose component i with probability P y i 2005 2007 Carlos Guestrin 2 The GMM assumption There are k components Component i has an associated mean vector i Each component generates data from a Gaussian with mean i and covariance matrix 2I Each data point is generated according to the following recipe 1 Pick a component at random Choose component i with probability P y i 2 Datapoint N i 2I 2005 2007 Carlos Guestrin 2 x The General GMM assumption There are k components Component i has an associated mean vector i Each component generates data from a Gaussian with mean i and covariance matrix i Each data point is generated according to the following recipe 1 Pick a component at random Choose component i with probability P y i 2 Datapoint N i i 2005 2007 Carlos Guestrin 2 1 3 Unsupervised Learning not as hard as it looks Sometimes easy Sometimes impossible and sometimes in between 2005 2007 Carlos Guestrin IN CASE YOU RE WONDERING WHAT THESE DIAGRAMS ARE THEY SHOW 2 d UNLABELED DATA X VECTORS DISTRIBUTED IN 2 d SPACE THE TOP ONE HAS THREE VERY CLEAR GAUSSIAN CENTERS Marginal likelihood for general case P y i x j T 1 1 1 exp x x j i i j i P y i m 2 1 2 2 2 i Marginal likelihood …


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-31-32-33-34-35-64-65-66-67 out of 67 pages.

Unsupervised learning or Clustering – K-means Gaussian mixture models

Please select your school