1“Unsupervised” as compared to “supervised” learning, where you are given labeled examples (with which you can construct training/test sets)2How could you cluster these examples? By what traits?34This is a key representation problem in clustering.5Just imagine what this would be like if you *didn’t* have these properties.6Should be able to use different distance metrics, and different data.Noise is very common, and shouldn't completely break your algorithm.Should not matter what order the data is given to the algorithm78Dendogram a lot like a binary treeWhat should distance be? max(Distance to closest shared ancestor)9A clustering of the same name in different languages.10111213Start off with each point as a cluster. Then consider all pairs of clusters, and choose the best pair. Combine the best pair into a cluster.14Again, choose the best two clusters and combine them.151617How do you determine the best pair of clusters? These are some metrics.18You get different clusters based on what distance metric you use.All make the same split into two big clusters, but the smaller clusters differ a lot19Often you are not told beforehand how many clusters there are. Dendrograms are good at letting you know how many clusters there are.20What to do with outliers? Well, we could ignore them!212223Edit distance between A and B is how much work you need to do to convert A to B, or B to A. Easy to define with strings, harder (but not impossible) to do with other things.2425Usually you have to tell the algorithm how many clusters you want.26This step 1 is the lame (really difficult) part of this.Step 2: Just pick a couple random points. But why cant you stop after that? Because, you don’t really know that the two points you picked are any good.That’s where step 3 comes in.Sometime you won't converge (a point will dither between two or more clusters). In this case you need some more advanced stopping condition.Sometime initial means are really bad, and have to be reinitialized.2728293031“All kinds of bad things can happen, but hey, it sort of works”.That’s pretty much something you’ll run into with unsupervised learning. When you cant’ even tell the algorithm what’s *supposed* to be right, it can come up with a lot of creative ways to screw it up.You *can* select your centers not at random, but we trust random more because humans have their own biases.32Fortunately, there are a lot of domains where clusters *are* convex. Gaussians, for instance. (Of course, when you actually know the statistical distribution, there are better methods)You can't guarantee convergence.33Useful if you don't start off with all the data343536The order the data arrives in will therefore make a big difference. That’s not great. Also the clusters can overlap (maybe just add the point to the nearest cluster?).37Note, finding the right number of clusters is an unsolved problem.The objective function is a function we would like to minimize. You can try to minimize the variance, for instance. But then you’re just going to keep going until you get N clusters and variance is 0.383940Stop when increasing the number of clusters doesn't help you very much.Why shouldn't we use more clusters? Then clusters become meaninglessFor other methods, see “model selection”. You can, for instance, decide to penalize the objective with a regularization term,
View Full Document