This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1“Unsupervised” as compared to “supervised” learning, where you are given labeled examples (with which you can construct training/test sets)2How could you cluster these examples? By what traits?34This is a key representation problem in clustering.5Just imagine what this would be like if you *didn’t* have these properties.6Should be able to use different distance metrics, and different data.Noise is very common, and shouldn't completely break your algorithm.Should not matter what order the data is given to the algorithm78Dendogram a lot like a binary treeWhat should distance be? max(Distance to closest shared ancestor)9A clustering of the same name in different languages.10111213Start off with each point as a cluster. Then consider all pairs of clusters, and choose the best pair. Combine the best pair into a cluster.14Again, choose the best two clusters and combine them.151617How do you determine the best pair of clusters? These are some metrics.18You get different clusters based on what distance metric you use.All make the same split into two big clusters, but the smaller clusters differ a lot19Often you are not told beforehand how many clusters there are. Dendrograms are good at letting you know how many clusters there are.20What to do with outliers? Well, we could ignore them!212223Edit distance between A and B is how much work you need to do to convert A to B, or B to A. Easy to define with strings, harder (but not impossible) to do with other things.2425Usually you have to tell the algorithm how many clusters you want.26This step 1 is the lame (really difficult) part of this.Step 2: Just pick a couple random points. But why cant you stop after that? Because, you don’t really know that the two points you picked are any good.That’s where step 3 comes in.Sometime you won't converge (a point will dither between two or more clusters). In this case you need some more advanced stopping condition.Sometime initial means are really bad, and have to be reinitialized.2728293031“All kinds of bad things can happen, but hey, it sort of works”.That’s pretty much something you’ll run into with unsupervised learning. When you cant’ even tell the algorithm what’s *supposed* to be right, it can come up with a lot of creative ways to screw it up.You *can* select your centers not at random, but we trust random more because humans have their own biases.32Fortunately, there are a lot of domains where clusters *are* convex. Gaussians, for instance. (Of course, when you actually know the statistical distribution, there are better methods)You can't guarantee convergence.33Useful if you don't start off with all the data343536The order the data arrives in will therefore make a big difference. That’s not great. Also the clusters can overlap (maybe just add the point to the nearest cluster?).37Note, finding the right number of clusters is an unsolved problem.The objective function is a function we would like to minimize. You can try to minimize the variance, for instance. But then you’re just going to keep going until you get N clusters and variance is 0.383940Stop when increasing the number of clusters doesn't help you very much.Why shouldn't we use more clusters? Then clusters become meaninglessFor other methods, see “model selection”. You can, for instance, decide to penalize the objective with a regularization term,


View Full Document

CMU CS 15381 - Lecture

Documents in this Course
Planning

Planning

19 pages

Planning

Planning

19 pages

Lecture

Lecture

27 pages

Lecture

Lecture

19 pages

FOL

FOL

41 pages

lecture

lecture

34 pages

Exam

Exam

7 pages

Lecture

Lecture

22 pages

Handout

Handout

11 pages

Midterm

Midterm

14 pages

lecture

lecture

83 pages

Handouts

Handouts

38 pages

mdp

mdp

37 pages

HW2

HW2

7 pages

nn

nn

25 pages

lecture

lecture

13 pages

Handout

Handout

5 pages

Lecture

Lecture

27 pages

Lecture

Lecture

62 pages

Lecture

Lecture

5 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?