UW-Madison ECE 738 - Support Vector Machine - An Introduction

Unformatted text preview:

Support Vector Machine: An IntroductionLinear Hyper-plane ClassifierDistance from a Point to a Hyper-planeOptimal Hyper-plane: Linearly Separable CaseSeparation GapQuadratic Optimization Problem FormulationOptimization (continued)A Numerical ExamplePrimal/Dual Problem FormulationFormulating the Dual ProblemNumerical Example (cont’d)Implication of Minimizing ||w||Non-separable CasesNon-Separable CasePrimal Problem FormulationDual Problem FormulationSolution to the Dual ProblemInner Product KernelsPolynomial KernelNumerical Example: XOR ProblemXOR Problem (Continued)Other Types of KernelsMercer's TheoremTesting with KernelsSVM Using Nonlinear KernelsSupport Vector Machine:An Introduction(C) 2001-2005 by Yu Hen Hu2Linear Hyper-plane Classifier For x in the side of o : wTx + b  0; d = +1; For x in the side of : wTx + b  0; d = 1. Distance from x to H: r = wTx/|w|  (b/|w|) = g(x) /|w| Given: {(xi, di); i = 1 to N, di  {+1, 1}}. A linear hyper-plane classifier is a hyper-plane consisting of points x such thatH = {x| g(x) = wTx + b = 0}g(x): a discriminant function!.x1x2wb/|w|xrH(C) 2001-2005 by Yu Hen Hu3Distance from a Point to a Hyper-plane Hence r = (wTx*+b)/|w| = g(x*)/|w| If x* is on the other side of H (same side as the origin), thenr = (wTx*+b)/|w| = g(x*)/|w|The hyper-plane H is characterized by wTx + b = 0 (*)w: normal vector perpendicular to H. (*) says any vector x on H that project to w will have a length of OA = b/|w|. Consider a special point C corresponding to vector x*. Its magnitude of projection onto vector w is wTx*/|w| = OA + BC.Or equivalently, wTx*/|w| = b/|w| + r rwX*HABC(C) 2001-2005 by Yu Hen Hu4Optimal Hyper-plane: Linearly Separable Case For di = +1, g(xi) = wTxi + b  |w|  woTxi + bo  1 For di = 1, g(xi) = wTxi + b  |w|  woTxi + bo  1 Optimal hyper-plane should be in the center of the gap.Support Vectors  Samples on the boundaries. Support vectors alone can determine optimal hyper-plane.Question: How to find optimal hyper-plane?x1x2(C) 2001-2005 by Yu Hen Hu5Separation Gap For xi being a support vector, For di = +1, g(xi) = wTxi + b = |w|  woTxi + bo = 1 For di = 1, g(xi) = wTxi + b = |w|  woTxi + bo = 1 Hence wo = w/(|w|), bo = b/(|w|). But distance from xi to hyper-plane is  = g(xi)/|w|. Thus wo = w/g(xi), and  = 1/|wo|. The maximum distance between the two classes is 2 = 2/|wo|. The objective is to find wo, bo to minimize |wo| (so that  is maximized) subject to the constraints that woTxi + bo  1 for di = +1; and woTxi + bo  1 for di = 1. Combine these constraints, one has: di(woTxi + bo)  1(C) 2001-2005 by Yu Hen Hu6Quadratic Optimization Problem Formulation Given {(xi, di); i = 1 to N}, find w and b such that (w) = wTw/2 is minimized subject to N constraints di  (wTxi + b)  0; 1  i  N. Method of Lagrange Multiplier   00),,(0),,( Set1)(),,(111NiiiNiiiiNiiTiidbbWJxdWWbWJbxWdWbWJ(C) 2001-2005 by Yu Hen Hu7Optimization (continued) The solution of Lagrange multiplier problem is at a saddle point where the minimum is sought w.r.t. w and b, while the maximum is sought w.r.t. i. Kuhn-Tucker Condition: at the saddle point, i[di (wTxi + b)  1] = 0 for 1  i  N. •If xi is NOT a support vector, the corresponding i = 0!•Hence, only support vector will affect the result of optimization!(C) 2001-2005 by Yu Hen Hu8A Numerical Example 3 inequalities: 1w + b  1; 2w + b  +1; 3w + b  + 1 J = w2/2  1(wb1)  2(2w+b1)  3(3w+b1) J/w = 0  w =  1 + 22 + 33 J/b = 0  0 = 1  2  3Kuhn-Tucker condition implies: (a) 1(wb1) = 0 (b) 2(2w+b1) = 0 (c); 3(3w + b 1) = 0 Later, we will see the solution is 1 = 2 = 2 and 3 = 0. This yieldsw = 2, b = 3. Hence the solution of decision boundary is: 2x  3 = 0. or x = 1.5! This is shown as the dash line in above figure. (1,1)(2,1)(3,1)(C) 2001-2005 by Yu Hen Hu9Primal/Dual Problem Formulation Given a constrained optimization problem with a convex cost function and linear constraints; a dual problem with the Lagrange multipliers providing the solution can be formulated. Duality Theorem (Bertsekas 1995) (a) If the primal problem has an optimal solution, then the dual problem has an optimal solution with the same optimal values. (b) In order for wo to be an optimal solution and o to be an optimal dual solution, it is necessary and sufficient that wo is feasible for the primal problem and (wo) = J(wo,bo, o) = Minw J(w,bo, o)(C) 2001-2005 by Yu Hen Hu10Formulating the Dual Problem At the saddle point, we have and , substituting these relations into above, then we have the Dual Problem NiiiNiiiTiNiiTdbxwdwwbwJ11121),,(NiiiixdW1Niiid10Maximize Subject to: and i  0 for i = 1, 2, …, N.     NiNiNjjTijijiixxddQ1 1 121)(Niiid10Note  NNNNNNNNiiddxxxxxxddQ112112111121)((C) 2001-2005 by Yu Hen Hu11Numerical Example (cont’d) or Q() = 1 + 2 + 3  [0.512 + 222 + 4.532  212  313 + 623]subject to constraints: 1 + 2 + 3 = 0, and 1  0, 2  0, and 3  0. Use Matlab Optimization tool box command: x=fmincon(‘qalpha’,X0, A, B, Aeq, Beq) The solution is [1 2 3] = [2 2 0] as expected.  3213213196364232121)(iiQ(C) 2001-2005 by Yu Hen Hu12Implication of Minimizing ||w|| Let D denote the diameter of the smallest hyper-ball that encloses all the input training vectors {x1, x2, …, xN}.The set of optimal hyper-planes described by the equationWoTx + bo = 0 has a VC-dimension h bounded from above as h  min { D2/2, m0} + 1 where m0 is the dimension of the input vectors, and  = 2/||wo|| is the margin of the separation of the hyper-planes. VC-dimension determines the complexity


View Full Document

UW-Madison ECE 738 - Support Vector Machine - An Introduction

Download Support Vector Machine - An Introduction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Support Vector Machine - An Introduction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Support Vector Machine - An Introduction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?