DOC PREVIEW
MIT 9 520 - Support Vector Machines

This preview shows page 1-2-3-22-23-24-44-45-46 out of 46 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Support Vector MachinesCharlie Frogner1MIT20101Slides mostly stolen from Ryan Rifkin (Google).C. Frogner Support Vector MachinesPlanRegularization derivation of SVMs.Geometric derivation of SVMs.Practical issues.C. Frogner Support Vector MachinesThe Regularization Setting (Again)We are given n examples (x1, y1), . . . , (xn, yn), with xi∈ Rnandyi∈ {−1, 1} for all i. As mentioned last class, we can find aclassification function by solving a regularized learning problem:minf∈H1nnXi=1V(yi, f(xi)) + λ||f||2H.Note that in this class we are specifically consider binaryclassification.C. Frogner Support Vector MachinesThe Hinge LossThe classical SVM arises by considering the specific lossfunctionV(f(x, y)) ≡ (1 − yf(x))+,where(k)+≡ max(k, 0).C. Frogner Support Vector MachinesThe Hinge Loss−3 −2 −1 0 1 2 300.511.522.533.54y * f(x)Hinge LossC. Frogner Support Vector MachinesSubstituting In The Hinge LossWith the hinge loss, our regularization problem becomesminf∈H1nnXi=1(1 − yif(xi))++ λ||f||2H.Note that we don’t have a12multiplier on the regularizationterm.C. Frogner Support Vector MachinesSlack VariablesThis problem is non-differentiable (because of the “kink” in V),so we introduce slack variables ξi, to make the problem easierto work with:minf∈H1nPni=1ξi+ λ||f||2Hsubject to : yif(xi) ≥ 1 − ξii = 1, . . . , nξi≥ 0 i = 1, . . . , nC. Frogner Support Vector MachinesApplying The Representer TheoremSubstituting in:f∗(x) =nXi=1ciK(x, xi),we arrive at a constrained quadratic programming problem:minc∈Rn,ξ∈Rn1nPni=1ξi+ λcTKcsubject to : yiPnj=1cjK(xi, xj) ≥ 1 − ξii = 1, . . . , nξi≥ 0 i = 1, . . . , nC. Frogner Support Vector MachinesAdding A Bias TermIf we add an unregularized bias term b, which presents sometheoretical difficulties to be discussed later, we arrive at the“primal” SVM:minc∈Rn,b∈R,ξ∈Rn1nPni=1ξi+ λcTKcsubject to : yi(Pnj=1cjK(xi, xj) + b) ≥ 1 − ξii = 1, . . . , nξi≥ 0 i = 1, . . . , nC. Frogner Support Vector MachinesStandard NotationIn most of the SVM literature, instead of the regularizationparameter λ, regularization is controlled via a parameter C,defined using the relationshipC =12λn.Using this definition (after multiplying our objective function bythe constant12λ, the basic regularization problem becomesminf∈HCnXi=1V(yi, f(xi)) +12||f||2H.Like λ, the parameter C also controls the tradeoff betweenclassification accuracy and the norm of the function. The primalproblem becomes ...C. Frogner Support Vector MachinesThe Reparametrized Problemminc∈Rn,b∈R,ξ∈RnCPni=1ξi+12cTKcsubject to : yi(Pnj=1cjK(xi, xj) + b) ≥ 1 − ξii = 1, . . . , nξi≥ 0 i = 1, . . . , nC. Frogner Support Vector MachinesHow to Solve?minc∈Rn,b∈R,ξ∈RnCPni=1ξi+12cTKcsubject to : yi(Pnj=1cjK(xi, xj) + b) ≥ 1 − ξii = 1, . . . , nξi≥ 0 i = 1, . . . , nThis is a constrained optimization problem. The generalapproach:Form the primal problem – we did this.Lagrangian from primal – just like Lagrange multipliers.Dual – one dual variable associated to each primalconstraint in the Lagrangian.C. Frogner Support Vector MachinesThe Reparametrized LagrangianWe derive the dual from the primal using the Lagrangian:L(c, ξ, b, α, ζ) = CnXi=1ξi+ cTKc−nXi=1αi(yi{nXj=1cjK(xi, xj) + b} − 1 + ξi)−nXi=1ζiξiC. Frogner Support Vector MachinesThe Reparametrized Dual, I∂L∂b=⇒nXi=1αiyi= 0∂L∂ξi= 0 =⇒ C − αi− ζi= 0=⇒ 0 ≤ αi≤ CThe reduced Lagrangian:LR(c, α) = cTKc −nXi=1αi(yinXj=1cjK(xi, xj) − 1)The relation between c and α:∂L∂c= 0 =⇒ ci= αiyiC. Frogner Support Vector MachinesThe Primal and Dual Problems Againminc∈Rn,b∈R,ξ∈RnCPni=1ξi+12cTKcsubject to : yi(Pnj=1cjK(xi, xj) + b) ≥ 1 − ξii = 1, . . . , nξi≥ 0 i = 1, . . . , nmaxα∈RnPni=1αi−12αTQαsubject to :Pni=1yiαi= 00 ≤ αi≤ C i = 1, . . . , nC. Frogner Support Vector MachinesSVM TrainingBasic idea: solve the dual problem to find the optimal α’s,and use them to find b and c:ci= αiyib = yi−nXj=1cjK(xi, xj)(We showed ciseveral slides ago, will show b in a bit.)The dual problem is easier to solve the primal problem. Ithas simple box constraints and a single inequalityconstraint, and the problem can be decomposed into asequence of smaller problems (see appendix).C. Frogner Support Vector MachinesOptimality conditions: complementary slacknessThe dual variables are associated with the primal constraints asfollows:αi=⇒ yi{nXj=1cjK(xi, xj) + b} − 1 + ξiζi=⇒ ξi≥ 0Complementary slackness: at optimality, either the primalinequality is satisfied with equality or the dual variable is zero.I.e. if c, ξ, b, α and ζ are optimal solutions to the primal anddual, thenαi(yi{nXj=1cjK(xi, xj) + b} − 1 + ξi) = 0ζiξi= 0C. Frogner Support Vector MachinesOptimality Conditions: all of themAll optimal solutions must satisfy:nXj=1cjK(xi, xj) −nXj=1yiαjK(xi, xj) = 0 i = 1, . . . , nnXi=1αiyi= 0C − αi− ζi= 0 i = 1, . . . , nyi(nXj=1yjαjK(xi, xj) + b) − 1 + ξi≥ 0 i = 1, . . . , nαi[yi(nXj=1yjαjK(xi, xj) + b) − 1 + ξi] = 0 i = 1, . . . , nζiξi= 0 i = 1, . . . , nξi, αi, ζi≥ 0 i = 1, . . . , nC. Frogner Support Vector MachinesOptimality Conditions, IIThe optimality conditions are both necessary and sufficient. Ifwe have c, ξ, b, α and ζ satisfying the above conditions, weknow that they represent optimal solutions to the primal anddual problems. These optimality conditions are also known asthe Karush-Kuhn-Tucker (KKT) conditons.C. Frogner Support Vector MachinesToward Simpler Optimality Conditions — DeterminingbSuppose we have the optimal αi’s. Also suppose (this happensin practice) that there exists an i satisfying 0 < αi< C. Thenαi< C =⇒ ζi> 0=⇒ ξi= 0=⇒ yi(nXj=1yjαjK(xi, xj) + b) − 1 = 0=⇒ b = yi−nXj=1yjαjK(xi, xj)So if we know the optimal α’s, we can determine b.C. Frogner Support Vector MachinesTowards Simpler Optimality Conditions, IDefining our classification function f(x) asf(x) =nXi=1yiαiK(x, xi) + b,we can derive “reduced” optimality conditions. For example,consider an i such that yif(xi) < 1:yif(xi) < 1 =⇒ ξi> 0=⇒ ζi= 0=⇒ αi= CC. Frogner Support Vector MachinesTowards Simpler Optimality Conditions, IIConversely, suppose αi= C:αi= C =⇒ yif(xi) − 1 + ξi= 0=⇒ yif(xi) ≤ 1C. Frogner Support Vector MachinesReduced Optimality ConditionsProceeding similarly, we can write the following “reduced”optimality


View Full Document

MIT 9 520 - Support Vector Machines

Documents in this Course
Load more
Download Support Vector Machines
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Support Vector Machines and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Support Vector Machines 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?