SVM as a Convex Optimization Problem Leon Gu CSD CMU Convex Optimization I Convex set the line segment between any two points lies in the set I Convex function the line segment between any two points x f x and y f y lies on or above the graph of f I Convex optimization minimize s t I I I I f0 x fi x 0 i 1 m hi x 0 i 1 p 1 2 3 f0 and fi convex hi linear convex objective function convex domain feasible set any local optimum is also a global optimum Operations preserve convexity I for convex sets intersection affine transformation perspective transformation I for convex functions nonnegative weighted sum maximum and supremum composition with affine functions composition with monotonic convex concave functions Optimal Separating Hyperplane Suppose that our data set xi yi N i 1 is linear separable Define a hyperplane by x f x T x 0 T x x0 0 where k k 1 I f x is the sign distance to the hyperplane I we can define a classification rule induced by f x sgn T x x0 Define the margin of f x to be the minimal yf x through the data C min yi f xi i A optimal separating hyperplane is the hyperplane that maximizes the margin max 0 k k 1 C s t yi T xi 0 C i 1 N We can get rid of the norm constraint on 1 yi T xi 0 C k k and arbitrarily set k k 1 C then we can rephrase the problem as min k k s t yi T xi 0 1 i 1 N 0 This is a convex optimization problem Soft Margin SVM The data is not always perfect We need to extend optimal separating hyperplane to non separable cases The trick is to relax the margin constraints by introducing some slack variables minimize s t k k over 0 yi T xi 0 1 i i 1 N i 0 N X i Z i 1 I still convex I i 1 misclassification i 0 the data is correctly classified but lies in the margin I Z is a tuning parameter How to solve it Use Lagrange duality theory 4 5 6 Lagrangian Theory Lagrangian theory characterizes the solution of a constrained optimization problem Recall the primal problem minimize s t f0 x fi x 0 i 1 m hi x 0 i 1 p 7 8 9 The stationary points are given by p m df0 x X dfi x X dhi x i i 0 dx dx dx i 1 i 1 where are free parameters called Lagrange multipliers Accordingly we define Lagrangian prime function or Lagrangian as L x f0 x m X i 1 i fi x p X i 1 i hi x We define Lagrangian dual function g as g inf L x x X The so called Lagrangian dual problem is the following maximize s t g 0 10 11 The weak duality theorem says g f0 x for all and In other words maximizing g over and produce a bound on f0 x Note that g is piecewise linear and convex The difference between g and f0 x is called the duality gap K K T Conditions Slater s Theorem Strong Duality Theorem says if the constraint functions are affine the duality gap is zero Then K K T conditions provide necessary and sufficient conditions for a point x to be an optimum L x x x i fi x i fi x hi x 0 first order derivative of optimality 0 0 0 0 complementary slackness conditions dual constraints prime constraints prime constraints Remarks complementary slackness conditions are directly related to support vectors The Dual Problem Recall the prime problem soft margin SVM minimize s t k k over 0 yi T xi 0 1 i i 1 N i 0 N X i Z 12 13 14 i 1 Obviously strong duality holds So we can find its dual problem by the following steps 1 Define Lagrange primal function and Lagrange multipliers 2 Take the first order derivatives w r t 0 and i and set to zero 3 Substitute the results into the primal function Maximize N X N N 1XX i i i0 yi yi0 hxi xi0 i 15 2 i 1 0 i 1 LD i 1 s t 0 i i 1 N N X i yi 0 16 17 i 1 Solution 18 N X i yi xi 19 i 1 f x T x 0 N X i yi hxi xi 0 20 i 1 I Sparse representation the separating hyperplane f x is spanned those data points i where i 6 0 called Support Vectors I follows directly from complementary slackness conditions i yi T xi 0 i 1 0 I Both the estimation and the evaluation of f x only involve dot product
View Full Document