Unformatted text preview:

Machine learning lecture 7 Tommi S Jaakkola MIT CSAIL tommi csail mit edu Topics Support vector machines separable case formulation margin non separable case penalties and logistic regression dual solution kernels examples properties Tommi Jaakkola MIT CSAIL 2 Support vector machine SVM When the training examples are linearly separable we can maximize a geometric notion of margin distance to the boundary by minimizing the regularization penalty kw1k2 2 d X wi2 2 i 1 x subject to the classification constraints x x x o o o x for i 1 n o o o o x yi w0 xTi w1 1 0 o o x x x x The solution is defined only on the basis of a subset of examples or support vectors Tommi Jaakkola MIT CSAIL 3 SVM separable case 2 We minimize kw1k 2 Pd 2 w 2 subject to i i 1 yi w0 xTi w1 1 0 i 1 n margin 1 kw 1k o o o o x o x o x o o x x w 1 o x x x x x f x w w 0 xT w 1 x o ooo o o x xx x xxx w1 x x 2 The resulting margin and the slope kw 1k are inversely related Tommi Jaakkola MIT CSAIL 4 SVM non separable case When the examples are not linearly separable we can modify the optimization problem slightly to add a penalty for violating the classification constraints We minimize kw1k2 2 C n X i i 1 subject to relaxed classification constraints yi w0 xTi w1 1 i 0 x x x x w 1 x x o o o o x x o o o o x x x x o o for i 1 n Here i 0 are called slack variables Tommi Jaakkola MIT CSAIL 5 SVM non separable case cont d We can also write the SVM optimization problem more compactly as C n z X i 1 yi w0 xTi w1 kw1k2 2 i 1 where z z if z 0 and zero otherwise i e returns the positive part Tommi Jaakkola MIT CSAIL 6 SVM non separable case cont d We can also write the SVM optimization problem more compactly as C n z X i 1 yi w0 xTi w1 kw1k2 2 i 1 where z z if z 0 and zero otherwise i e returns the positive part This is equivalent to regularized empirical loss minimization n X 1 n i 1 1 yi w0 xTi w1 kw1k2 2 where 1 nC is the regularization parameter Tommi Jaakkola MIT CSAIL 7 SVM vs logistic regression When viewed from the point of view of regularized empirical loss minimization SVM and logistic regression appear quite similar n X 1 SVM 1 yi w0 xTi w1 kw1k2 2 n i 1 Logistic n z X 1 n i 1 log P yi x w log g yi w0 xTi w1 kw1k2 2 where g z 1 exp z 1 is the logistic function Note that we have transformed the problem maximizing the penalized log likelihood into minimizing negative penalized log likelihood Tommi Jaakkola MIT CSAIL 8 SVM vs logistic regression cont d The difference comes from how we penalize errors Both n 1X n i 1 z z Loss yi w0 xTi w1 kw1k2 2 5 SVM SVM loss LR loss 4 5 4 Loss z 1 z 3 5 loss 3 Regularized logistic reg 2 5 2 1 5 Loss z log 1 exp z 1 0 5 0 4 Tommi Jaakkola MIT CSAIL 3 2 1 0 z 1 2 3 4 9 SVM solution Lagrange multipliers Back to the separable case how do we solve min kw1k2 2 subject to yi w0 xTi w1 1 0 i 1 n Tommi Jaakkola MIT CSAIL 10 SVM solution Lagrange multipliers Back to the separable case how do we solve min kw1k2 2 subject to yi w0 xTi w1 1 0 i 1 n Let start by representing the constraints as losses T 0 y w x i 0 T i w1 1 0 max 1 yi w0 xi w1 0 otherwise Tommi Jaakkola MIT CSAIL 11 SVM solution Lagrange multipliers Back to the separable case how do we solve min kw1k2 2 subject to yi w0 xTi w1 1 0 i 1 n Let start by representing the constraints as losses 0 yi w0 xTi w1 1 0 T max 1 yi w0 xi w1 0 otherwise and rewrite the minimization problem in terms of these min n w Tommi Jaakkola MIT CSAIL kw1k2 2 n X i 1 o max i 1 yi w0 xTi w1 i 0 12 SVM solution Lagrange multipliers Back to the separable case how do we solve min kw1k2 2 subject to yi w0 xTi w1 1 0 i 1 n Let start by representing the constraints as losses T 0 y w x i 0 T i w1 1 0 max 1 yi w0 xi w1 0 otherwise and rewrite the minimization problem in terms of these n n o X min kw1k2 2 max i 1 yi w0 xTi w1 w i 1 i 0 n n o X min max kw1k2 2 i 1 yi w0 xTi w1 w i 0 Tommi Jaakkola MIT CSAIL i 1 13 SVM solution cont d We can then swap max and min n n o X min max kw1k2 2 i 1 yi w0 xTi w1 w i 0 max min i 0 w i 1 n kw1k2 2 n X o i 1 yi w0 xTi w1 i 1 z J w As a result we have to be able to minimize J w with respect to parameters w for any fixed setting of the Lagrange multipliers i 0 Tommi Jaakkola MIT CSAIL 14 SVM solution cont d We can then swap max and min n n o X min max kw1k2 2 i 1 yi w0 xTi w1 w i 0 max min i 0 w i 1 n kw1k2 2 n X o i 1 yi w0 xTi w1 i 1 z J w We can find the optimal w as a function of i by setting the derivatives to zero n X J w w1 iyixi 0 w1 i 1 n X J w iyi 0 w0 i 1 Tommi Jaakkola MIT CSAIL 15 SVM solution cont d We can then substitute the solution n X J w w1 iyixi 0 w1 i 1 n X J w iyi 0 w0 i 1 back into the objective and get after some algebra n n o X max kw 1k2 2 i 1 yi w 0 xTi w 1 P i 0 i i yi 0 max P i 0 i i yi 0 Tommi Jaakkola MIT CSAIL i 1 n nX n o 1 X i yiyj i j xTi xj 2 i j 1 i 1 16 SVM solution summary We can find the optimal setting of the Lagrange multipliers i by maximizing n X n X 1 i yiyj i j xTi xj 2 i 1 i j 1 P subject to i 0 and …


View Full Document

MIT 6 867 - Machine learning: lecture 7

Loading Unlocking...
Login

Join to view Machine learning: lecture 7 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Machine learning: lecture 7 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?