Soft marginsHard margins:for i = 1, . . . , m, yi(w0xi+ b) ≥ 1Soft margins:for i = 1, . . . , m, yi(w0xi+ b) ≥ 1 − ζiζi≥ 0The primal problem:Let C be a constant that corresponds to the “amount of allowed softness”. The function to beminimized and the linear inequality constraints are augmented to:Minimize12|w|2+ CmXi=1ζisubject to the 2m linear inequality constraints:for i = 1, . . . , m, yi(w0xi+ b) ≥ 1 − ζi, ζi≥ 0Intuitively, large values of C would emphasize the requirement that the ζiare small, and thusdecrease the softness.Derivation of the dual problemThe Lagrangian of the primal problem:L(w, b, ζ1, . . . , ζm, α1, . . . , αm, r1, . . . , rm) =12|w|2+ CmXi=1ζi+mXi=1αi(1 − ζi− yi(w0xi+ b)) −mXi=1riζi(1)To compute the dual problem we need to minimize L with respect to w, b, ζiso that it is a functionof only αi, ri.The derivative of L w.r.t. w gives: w =Pmi=1αiyixiThe derivative of L w.r.t. b gives:Pmi=1αiyi= 0The derivative of L w.r.t. ζigives: C − αi− ri= 0(2)Substituting these values in L and simplifying we get::L(α1, . . . , αm, r1, . . . , rm) = −12mXi=1mXj=1αiαjyiyjx0ixj+mXi=1αiThis is exactly the same dual function as in the hard-margins case. For the dual problem we alsoneed the last two constraints in (2), and αi≥ 0, ri≥ 0. The difference between the hard and thesoft case is that from the third equation in (2) and the condition ri≥ 0 we have: αi≤ C.1The dual problem:Maximize L(α1, . . . , αm) =mXi=1αi−12mXi=1mXj=1αiαjyiyjx0ixjsubject to: 0 ≤ αi≤ C,mXi=1αiyi= 0This is a quadratic programming problem and we assume that there is a black-box that solves it.The solution gives the values of the αi.The Karush-Kuhn-Tucker Complementary ConditionsIn this case the KKT condition gives:αiyi(w0xi+ b) − 1 + ζi= 0ζi(αi− C) = 0From the second condition it follows that either ζi= 0, or αi= C. Therefore:αi= 0 → not support vector0 < αi< C → yi(w0xi+ b) = 1 point on hard marginαi= C → yi(w0xi+ b) = 1 − ζipoint on soft marginRecovering w, bFrom (2) it follows that w can be recovered from the support vectors in the same way as in thehard-margins case:w =kXj=1αjyjxj(3)Once w is determined the value of b can be computed from any one of the hard margins supportvectors (with αi< C), using the same formulas as in the hard-margins case:0 < αs< C → b =1ys− w0xs(4.1)As in the hard-margins case it is also possible to compute the value of b from all support vectorson the hard margins (satisfying 0 < αs< C). Since the formulas for w, b are the same as in thehard-margins case we can also use kernels.The value of ζIn the hard-margins case the dual optimization problem can give infinite values, indicating thatthe primal problem has no solution (the data is not linearly separable.) This cannot happen in thesoft-margins case. If the point i is wrongly classified by the hyperplane then we can always chooseζi= 1−yi(w0xi+b), since this gives ζ ≥ 0 (in fact it gives ζ ≥ 1). If the point i is correctly classifiedbut with distance from the margins that is too short, we can still choose zetai= 1 − yi(w0xi+ b),since we would still have ζ ≥ 0. The case in which ζ < 0 corresponds to points that are correctlyclassified with yi(w0xi+ b) ≥ 1, and they are not inside the soft margins.2Examplei 0 1 2 3 4xi0 1 2 3 4yi-1 -1 1 -1 1Lagrangian multiplier α0α1α2α3α4The dual problem:maximize L(α0, . . . , α4) = α0+ α1+ α2+ α3+ α4−12(α21+ 4α22+ 9α23+ 16α24− 4α1α2+ 6α1α3− 8α1α4− 12α2α3+ 16α2α4− 24α3α4)= α0+ α1+ α2+ α3+ α4−12(−α1+ 2α2− 3α3+ 4α4)2subject to: 0 ≤ α0≤ C, 0 ≤ α1≤ C, 0 ≤ α2≤ C, 0 ≤ α3≤ C, 0 ≤ α4≤ C,− α0− α1+ α2− α3+ α4= 0With C = 10 the solution (computed by the black box quadratic optimizer) is: α0= 0, α1= α4=3.55, α2= α3= 10. Therefore, the support vectors are x1, x2, x3, x4.We can now compute w from (3):w = −3.55 + 20 − 30 + 4 ∗ 3.55 = 0.66666The value of b can be computed, for example, from x1, the first support vector, using (4.1):b = −1 − 0.666 = −1.666It cannot be computed from x2, x3since they satisfy αi= C. It can be computed from x4:using (4.1):b = 1 − 0.6666 × 4 = −1.6666Observe that in this case x2, x3are wrongly classified.DistancesIn our case the “hyperplane” is the point satisfying w0x+b = 0, which is x = 2.5. The distance of thehard-margins support vectors from that hyperplane is 1.5. Observe that 1.5/|w| = 1, as expected.The ζ value for x2is 1 − (−1/3) = 4/3. Its distance from the hyperplane is (1 − ζ)/|w| =
View Full Document