Machine Learning CS6375 Spring 2015 Support Vector Machines 1 Linear Classifiers How would you classify this data 2 1 Linear Classifiers Any of these would be fine but which is the best 3 Linear Classifiers Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a data point 4 2 Linear Classifiers The maximum margin linear classifier is the linear classifier with the maximum margin This is the simplest kind of SVM Called a linear SVM 5 Linear Classifiers The maximum margin linear classifier is the linear classifier with the maximum margin This is the simplest kind of SVM Called a linear SVM 6 3 Why Maximum Margin 1 Intuitively this feels safest 2 If we ve made a small error in the location of the boundary this gives us least chance of causing a misclassification 3 LOOCV is easy since the model is immune to removal of any non support vector data points 4 Related theory using VC dimension showing that this is a good thing 5 Empirically it works very very well 7 Specifying a Line and a Margin How do we represent this mathematically in m input dimensions 8 4 Specifying a Line and a Margin Plus plane x w x b 1 Minus plane x w x b 1 Classify as 1 1 if w x b 1 if w x b 1 9 Computing the Margin Width Plus plane x w x b 1 Minus plane x w x b 1 Claim The vector w is perpendicular to the Plus Plane Why 10 5 Computing the Margin Width Plus plane x w x b 1 Minus plane x w x b 1 The vector w is perpendicular to the Plus Plane Let x be any point on the minus plane Let x be the closest plus plane point to x Claim x x w for some value of Why 11 Computing the Margin Width Plus plane x w x b 1 Minus plane x w x b 1 The vector w is perpendicular to the Plus Plane Let x be any point on the minus plane Let x be the closest plus plane point to x Claim x x w for some value of Why 12 6 Computing the Margin Width What we know w x b 1 w x b 1 x x w x x M It s now easy to get M in terms of w and b w x w b 1 w x b w w 1 1 w w 1 13 Computing the Margin Width What we know w x b 1 w x b 1 x x w x x M 14 7 Computing the Margin Width Given a guess of w and b we can Compute whether all data points in the correct half planes Compute the width of the margin So now we just need to write a program to search the space of w s and b s to find the widest margin or the smallest w w that matches all the data points How 15 Learning the Maximum Margin Classifier What should our quadratic optimization criterion be Minimize w w How many constraints will we have R What should they be w xk b 1 if yk 1 w xk b 1 if yk 1 16 8 Learning via Quadratic Programming QP is a well studied class of optimization algorithms to maximize a quadratic function of some real valued variables subject to linear constraints 17 Quadratic Programming 18 9 Quadratic Programming There exist algorithms for finding such constrained quadratic optima much more efficiently and reliably than gradient ascent But they are very fiddly you probably don t want to write one yourself 19 Quadratic Optimization We ll review unconstrained optimization optimization with linear equality constraints optimization with linear equality inequality constraints 20 10 Unconstrained Optimization Goal compute the minimum of a quadratic function f x Without loss of generality let f x x Ax b x c f has a minimum if and only if A is positive semi definite f has a unique minimum if and only if A is positive definite In both cases the minimum is values of x that zero out the gradient That is the solutions of Ax b 2 21 Example 1 Find the minimum of With x x1 x2 we have Therefore if there is a minimum it is the solution of The solution is 7 2 so that if a minimum exists it is at x1 7 and x2 2 To argue that it is indeed a minimum we still need to show that the matrix is positive semi definite One way of showing that A is positive semi definite is to show that there is a matrix G such that A G G In our case with 1 1 A 1 2 we can take 22 11 Optimization with Linear Equality Constraints Goal compute the minimum of f x where x is also subject to linear equality constraints A linear equality constraint can be written as cx d where c is a vector and d is a scalar When there are k constraints we write them as cix di Trick reduce this to the case of unconstrained optimization using the method of Lagrange multipliers 23 Optimization with Linear Equality Constraints In order to minimize f x under the above equality constraints we define a Lagrangian multiplier i for the ith constraint and form the Lagrangian function L x 1 k as follows We can now treat L as a function of n k variables and use the method of unconstrained minimization to find the unconstrained minimum of L 24 12 Example 2 Minimize under the two constraints Method 1 By direct substitution we see that Then by taking derivatives and equating to 0 we see that the minimum is at x1 1 2 Method 2 Using the Lagrange technique we have The derivative The derivative The derivative The derivative The derivative wrt x1 gives 2x1 1 0 wrt x2 gives 2x2 1 0 wrt x3 gives 2x3 2 0 wrt 1 gives x2 x1 1 0 wrt 2 gives x3 1 0 Treating this as a system of 5 equations with 5 unknowns we get x1 1 2 x2 1 2 x3 1 1 1 2 2 25 The Dual Problem There is another way of using the Lagrangian to solve the constrained optimization problem By treating the Lagrangian multipliers as constants we 1 optimize the Lagrangian with respect to x by taking the derivatives only with respect to x 2 solve the resulting system of linear equations for x in terms of the Lagrangian multipliers 3 substitute back into the Lagrangian and transform it into a function of only the Lagrangian multipliers This dual problem can be solved using unconstrained techniques 26 13 Example 3 To convert Example 2 into a dual form we only need to take the derivatives of the Lagrangian with respect to x1 x2 x3 As above The derivative wrt x1 gives 2x1 1 0 The derivative wrt x2 gives 2x2 1 0 The derivative wrt x3 gives 2x3 2 0 …
View Full Document