SVMs and Kernels10701/15781 recitation10/22/09Ekaterina Spriggs1Tuesday, October 27, 2009From a linear classifier to ...*One of the most famous slides you will see, ever!!iwixi+ b ≥ 0!iwixi+ b ≤ 02Tuesday, October 27, 2009Maximum marginMaximum possible separation between positive and negative training examples*One of the most famous slides you will see, ever!minimizew,bw.w(wxj+ b)yj≥ 1, ∀j3Tuesday, October 27, 2009Number of support vectorsminimizew,bw.w(wxj+ b)yj≥ 1, ∀jSVMs:where m is dimensionof the input vectorm + 1examples...4Tuesday, October 27, 2009Number of support vectorsminimizew,bw.w(wxj+ b)yj≥ 1, ∀jSVMs:where m is dimensionof the input vectorm + 1Except for degenerate cases!At most!5Tuesday, October 27, 2009Multi-class SVM example6Tuesday, October 27, 2009Multi-class SVM exampleRule?7Tuesday, October 27, 2009Multi-class SVM examplew(yj).xj+ b(yj)≥ w(y!).xj+ b(y!)+1,∀y!#= yj, ∀j8Tuesday, October 27, 2009Multi-class SVM examplew(yj).xj+ b(yj)≥ w(y!).xj+ b(y!)+1,∀y!#= yj, ∀j9Tuesday, October 27, 2009Kernels10Tuesday, October 27, 2009Kernels11Tuesday, October 27, 2009Kernels12Tuesday, October 27, 2009KernelsComplexity of the optimization problem remains only dependent on the dimensionality of the input space and not of the feature space!13Tuesday, October 27, 2009KernelsComplexity of the optimization problem remains only dependent on the dimensionality of the input space and not of the feature space!K(x, z)=Φ(x)TΦ(z)Infinite dimensions?14Tuesday, October 27, 2009Finding the margin by hand15Tuesday, October 27, 2009Finding the margin by hand16Tuesday, October 27, 2009Finding the margin by hand17Tuesday, October 27, 2009But this is 2D data?How many SVs now?w =!iαiyiΦ(xi)b = yk− wΦ(xk),for any k where αk> 018Tuesday, October 27, 2009How many SVs now?The worst case is the VC dimension19Tuesday, October 27, 2009VC dimensionFor a given algorithm, the largest set of points that the algorithm can shatter.20Tuesday, October 27, 2009VC dimensionFor a given algorithm, the largest set of points that the algorithm can shatter.21Tuesday, October 27, 2009VC dimensionFor a given algorithm, the largest set of points that the algorithm can shatter.22Tuesday, October 27, 2009VC dimensionFor a linear classifier in m dimensions, VC dimension is (m+1)So, the worst case for the number of SVs is (m+1)23Tuesday, October 27, 2009How many SVs now?Dimensionality:w depends only on the alpha’s, not on the dim of x!w =!iαiyiΦ(xi)b = yk− wΦ(xk),for any k where αk> 024Tuesday, October 27, 2009QuizWhy 1 and -1?(wxj+ b) ≥ 1, for yi= +1(wxj+ b) ≤−1, for yi= −125Tuesday, October 27, 2009QuizCan we apply a kernel to any algorithm?w∗= arg minw!j(yj−!iwihi(xj))2minimizew,bw.w(wxj+ b)yj≥ 1, ∀jSVM:LR:Decision trees?Boosting?26Tuesday, October 27, 2009QuizComputing the bʼs:b = yk− wxk, for any k where αk> 0Which k do we choose?27Tuesday, October 27, 2009K-NN and homework problemCross-validation errorTraining errorTesting error28Tuesday, October 27, 2009Questions?29Tuesday, October 27,
View Full Document