DOC PREVIEW
CMU CS 10701 - Support Vector Machines

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Machine LearningMachine Learning1010--701/15701/15--781, Fall 2006781, Fall 2006Support Vector MachinesSupport Vector MachinesEric XingEric XingLecture 8, October 5, 2006Reading: Chap. 6&7, C.B bookOutlinez Maximum margin classificationz Constrained optimizationz Lagrangian dualityz Kernel trickz Non-separable cases2What is a good Decision Boundary?z Consider a binary classification task with y = ±1 labels (not 0/1 as before). z When the training examples are linearly separable, we can set the parameters of a linear classifier so that all the training examples are classified correctlyz Many decision boundaries!z Generative classifiersz Logistic regressions …z Are all decision boundaries equally good?Class 1Class 2Examples of Bad Decision Boundariesz Why we may have such boundaries?z Irregular distributionz Imbalanced training sizesz outliners3Classification and Marginz Parameterzing decision boundaryz Let w denote a vector orthogonal to the decision boundary, and b denote a scalar "offset" term, then we can write the decision boundary as:Class 1Class 2m0=+ bxwTz MarginwTx+b > 0 for all x in class 2wTx+b <0 for all x in class 1Or more compactly:(wTxi+b)yi>0The margin between two pointsm=(wTxi+b)-(wTxj+b)=wT(xi-xj)Maximum Margin Classificationz The margin is:z It make sense to set constrains on W:z Here is our Maximum Margin Classification problem:z Equivalently, we can instead work on this:()**jiTxxwm −=1=∀≥+wimbxwymiTibw ,)(s.tmax,imbxwywmiTibw∀≥+ ,)(s.tmax,4Maximum Margin Classification, con'd.z The optimization problem:z But note that the magnitude of m merely scales w and b, and does not change the classification boundary at all!z So we instead work on this cleaner problem:z The solution to this leads to the famous Support Vector Machines --- believed by many to be the best "off-the-shelf" supervised learning algorithmimbxwywmiTibw∀≥+ ,)(s.tmax,ibxwywiTibw∀≥+ ,)(s.tmax,11Support vector machinez A convex quadratic programming problemwith linear constrains:z The attained margin is now given byz Only a few of the classification constraints are relevant Î support vectorsz Constrained optimizationz We can directly solve this using commercial quadratic programming (QP) codez But we want to take a more careful investigation of Lagrange duality, and the solution of the above is its dual form. Î deeper insight: support vectors, kernels …Î more efficient algorithmibxwywiTibw∀≥+ ,)(s.tmax,11w1m+m-5Lagrangian Dualityz The Primal ProblemPrimal:The generalized Lagrangian:the α's (αι≥0) and β's are called the Lagarangian multipliers Lemma:A re-written Primal:liwhkiwgwfiiw,1, ,)(,1, ,)()(s.t.minKK===≤00∑∑==++=liiikiiiwhwgwfw11)()()(),,(βαβαL⎩⎨⎧∞=≥o/wsconstraint primal satisfies if)(),,(max,,wwfw iβααβαL0),,(maxmin,,βααβαw iwL0≥Lagrangian Duality, cont.z Recall the Primal Problem:z The Dual Problem:z Theorem (weak duality): z Theorem (strong duality):Iff there exist a saddle point of , we have),,(maxmin,,βααβαw iwL0≥),,(minmax,,βααβαwwiL0≥*,,,,*),,(maxmin ),,(minmax pw wdiiww=≤=≥≥βαβααβααβαLL00**pd =),,(βαwL6A sketch of strong and weak dualityz Now, ignoring h(x) for simplicity, let's look at what's happening graphically in the duality theorems.**)()(maxmin )()(minmax pwgw fwgwfdTwTwii=+≤+=≥≥αααα00)(wf)(wg)(wf)(wgThe KKT conditionsz If there exists some saddle point of L, then the saddle point satisfies the following "Karush-Kuhn-Tucker" (KKT) conditions:z Theorem: If w*, α* and β* satisfy the KKT condition, then it is also a solution to the primal and the dual problems.kikiwgkiwgαliwniwwiiiiii,, ,,, ,)(,, ,)(,, , ),,(,, , ),,(KKKKK1010101010=≥=≤====∂∂==∂∂αβαββαLL7Solving optimal margin classifierz Recall our opt problem:z This is equivalent toz Write the Lagrangian:z Recall that (*) can be reformulated asNow we solve its dual problem: ibxwywiTibw∀≥+ ,)(s.tmax,11ibxwywwiTiTbw∀≤+− ,)(s.tmin,0121[]∑=−+−=miiTiiTbxwywwbw1121)(),,(ααL*( )),,(maxmin,ααbw ibwL0≥),,(minmax,ααbwbwiL0≥***( )The Dual Problemz We minimize Lwith respect to w and b first:Note that (*) implies: z Plus (***) back to L, and using (**), we have:),,(minmax,ααbwbwiL0≥, ),,(∑==−=∇miiiiwxywbw10ααL, ),,(∑===∇miiibybw10ααL∑==miiiixyw1α*( )∑∑==−=mjijTijijimiiyybw1121,)(),,( xxααααL**( )8The Dual problem, cont.z Now we have the following dual opt problem:z This is, (again,) a quadratic programming problem.z A global maximum of αican always be found. z But what's the big deal??z Note two things:1. w can be recovered by 2. The "kernel"∑∑==−=mjijTijijimiiyy1121,)()(max xxαααααJ. ,, , s.t.∑===≥miiiiyki1010ααK∑==miiiiyw1xαjTixxSee next …More later …Support vectorsz Note the KKT condition --- only a few αi's can be nonzero!!kiwgαii,, ,)( K10==α6=1.4Class 1Class 2α1=0.8α2=0α3=0α4=0α5=0α7=0α8=0.6α9=0α10=0Call the training data points whose αi's are nonzero the support vectors (SV)9Support vector machinesz Once we have the Lagrange multipliers {αi}, we can reconstruct the parameter vector w as a weighted combination of the training examples:z For testing with a new data zz Compute and classify z as class 1 if the sum is positive, and class 2 otherwisez Note: w need not be formed explicitly∑∈=SViiiiyw xα()bzybzwSViTiiiT+=+∑∈xαInterpretation of support vector machinesz The optimal w is a linear combination of a small number of data points. This “sparse” representation can be viewed as data compression as in the construction of kNN classifierz To compute the weights {αi}, and to use support vector machines we need to specify only the inner products (or kernel) between the examples z We make decisions by comparing each new example z with only the support vectors:jTixx()⎟⎠⎞⎜⎝⎛+=∑∈bzyySViTiiixαsign*10Non-linearly Separable Problemsz We allow “error” ξiin classification; it is based on the output of the discriminant function wTx+bz ξiapproximates the number of misclassified samplesClass 1Class 2Soft Margin Hyperplanez Now we have a slightly different opt problem:z ξiare “slack variables” in optimizationz Note that ξi=0 if there is no error for xiz ξiis an upper bound of the number of errorsz C : tradeoff parameter between error and margin , ,)(s.tiibxwyiiiTi∀≥∀−≥+01ξξ∑=+miiTbwCww121ξ,min11The Optimization


View Full Document

CMU CS 10701 - Support Vector Machines

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Support Vector Machines
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Support Vector Machines and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Support Vector Machines 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?