DOC PREVIEW
UT Dallas CS 6375 - SVM-soft-margins

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Soft marginsHard margins:for i = 1, . . . , m, yi(w0xi+ b) ≥ 1Soft margins:for i = 1, . . . , m, yi(w0xi+ b) ≥ 1 − ζiζi≥ 0The primal problem:Let C be a constant that corresponds to the “amount of allowed softness”. The function to beminimized and the linear inequality constraints are augmented to:Minimize12|w|2+ CmXi=1ζisubject to the 2m linear inequality constraints:for i = 1, . . . , m, yi(w0xi+ b) ≥ 1 − ζi, ζi≥ 0Intuitively, large values of C would emphasize the requirement that the ζiare small, and thusdecrease the softness.Derivation of the dual problemThe Lagrangian of the primal problem:L(w, b, ζ1, . . . , ζm, α1, . . . , αm, r1, . . . , rm) =12|w|2+ CmXi=1ζi+mXi=1αi(1 − ζi− yi(w0xi+ b)) −mXi=1riζi(1)To compute the dual problem we need to minimize L with respect to w, b, ζiso that it is a functionof only αi, ri.The derivative of L w.r.t. w gives: w =Pmi=1αiyixiThe derivative of L w.r.t. b gives:Pmi=1αiyi= 0The derivative of L w.r.t. ζigives: C − αi− ri= 0(2)Substituting these values in L and simplifying we get::L(α1, . . . , αm, r1, . . . , rm) = −12mXi=1mXj=1αiαjyiyjx0ixj+mXi=1αiThis is exactly the same dual function as in the hard-margins case. For the dual problem we alsoneed the last two constraints in (2), and αi≥ 0, ri≥ 0. The difference between the hard and thesoft case is that from the third equation in (2) and the condition ri≥ 0 we have: αi≤ C.1The dual problem:Maximize L(α1, . . . , αm) =mXi=1αi−12mXi=1mXj=1αiαjyiyjx0ixjsubject to: 0 ≤ αi≤ C,mXi=1αiyi= 0This is a quadratic programming problem and we assume that there is a black-box that solves it.The solution gives the values of the αi.The Karush-Kuhn-Tucker Complementary ConditionsIn this case the KKT condition gives:αiyi(w0xi+ b) − 1 + ζi= 0ζi(αi− C) = 0From the second condition it follows that either ζi= 0, or αi= C. Therefore:αi= 0 → not support vector0 < αi< C → yi(w0xi+ b) = 1 point on hard marginαi= C → yi(w0xi+ b) = 1 − ζipoint on soft marginRecovering w, bFrom (2) it follows that w can be recovered from the support vectors in the same way as in thehard-margins case:w =kXj=1αjyjxj(3)Once w is determined the value of b can be computed from any one of the hard margins supportvectors (with αi< C), using the same formulas as in the hard-margins case:0 < αs< C → b =1ys− w0xs(4.1)As in the hard-margins case it is also possible to compute the value of b from all support vectorson the hard margins (satisfying 0 < αs< C). Since the formulas for w, b are the same as in thehard-margins case we can also use kernels.The value of ζIn the hard-margins case the dual optimization problem can give infinite values, indicating thatthe primal problem has no solution (the data is not linearly separable.) This cannot happen in thesoft-margins case. If the point i is wrongly classified by the hyperplane then we can always chooseζi= 1−yi(w0xi+b), since this gives ζ ≥ 0 (in fact it gives ζ ≥ 1). If the point i is correctly classifiedbut with distance from the margins that is too short, we can still choose zetai= 1 − yi(w0xi+ b),since we would still have ζ ≥ 0. The case in which ζ < 0 corresponds to points that are correctlyclassified with yi(w0xi+ b) ≥ 1, and they are not inside the soft margins.2Examplei 0 1 2 3 4xi0 1 2 3 4yi-1 -1 1 -1 1Lagrangian multiplier α0α1α2α3α4The dual problem:maximize L(α0, . . . , α4) = α0+ α1+ α2+ α3+ α4−12(α21+ 4α22+ 9α23+ 16α24− 4α1α2+ 6α1α3− 8α1α4− 12α2α3+ 16α2α4− 24α3α4)= α0+ α1+ α2+ α3+ α4−12(−α1+ 2α2− 3α3+ 4α4)2subject to: 0 ≤ α0≤ C, 0 ≤ α1≤ C, 0 ≤ α2≤ C, 0 ≤ α3≤ C, 0 ≤ α4≤ C,− α0− α1+ α2− α3+ α4= 0With C = 10 the solution (computed by the black box quadratic optimizer) is: α0= 0, α1= α4=3.55, α2= α3= 10. Therefore, the support vectors are x1, x2, x3, x4.We can now compute w from (3):w = −3.55 + 20 − 30 + 4 ∗ 3.55 = 0.66666The value of b can be computed, for example, from x1, the first support vector, using (4.1):b = −1 − 0.666 = −1.666It cannot be computed from x2, x3since they satisfy αi= C. It can be computed from x4:using (4.1):b = 1 − 0.6666 × 4 = −1.6666Observe that in this case x2, x3are wrongly classified.DistancesIn our case the “hyperplane” is the point satisfying w0x+b = 0, which is x = 2.5. The distance of thehard-margins support vectors from that hyperplane is 1.5. Observe that 1.5/|w| = 1, as expected.The ζ value for x2is 1 − (−1/3) = 4/3. Its distance from the hyperplane is (1 − ζ)/|w| =


View Full Document

UT Dallas CS 6375 - SVM-soft-margins

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download SVM-soft-margins
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view SVM-soft-margins and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view SVM-soft-margins 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?