1Review from last lecture234567Back to standard machine learning problem. 2 dimensions (2 attributes, x1 and x2). 2 classes (red and blue). A classifier would classify each point as red or blue.The points are clearly linearly separable (blue can be separated from red with a line)891011This is how the perceptron with the initial weights will classify points. Points above the line are classified as 1.Not very surprising that this isn't very good, as we randomly set the weights12The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just ignore that.13So here’s what’s happening in a perceptron with those weights.Picked a mislabeled point, and update the weights14Pick another mislabeled point, update weights again15And again...16Note that each time we pick a blue point essentially nothing happens.17Note: w0 could only ever change by alpha. If alpha is too small, it will move too slow. If alpha is too big, you might not have the resolution to find the answer.1819All these lines separate red from blue. But is one better than the other?20212223Pick the direction with the largest margin, and then put the separator in the middle of the margin. There is a pretty efficient algorithm to find this separator (but the algorithm is a bit mathy to present here)When the data is not linearly separable then there’s a thing called “soft margin” you can look at.The points on the dotted line are considered the “support vectors”2425Running a perceptron on this data2627The data isn’t linearly separable. Boo. How can we make this data linearly separable?28But you can make it linearly separable by using a transformation . For example, you can make points (x1, x1^2).By mapping the one dimensional examples to higher dimensions you can make them linearly separable!2930Oh hey check this out mapping to higher dimensions is actually a pretty good idea. The trick is just choosing how to map it.31This is very pretty, watch
View Full Document