CMU CS 10601 - Support Vector Machines - D307171

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10601> Support Vector Machines

DOC PREVIEW

CMU CS 10601 - Support Vector Machines

School name Carnegie Mellon University

Course Cs 10601- Introduction to Machine Learning

Pages 33

This preview shows page 1-2-15-16-17-32-33 out of 33 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Support'Vector'Machines'Aar$%Singh%%%Machine%Learning%101601%Nov%17,%2011%TexPoint%fonts%used%in%EMF.%%Read%the%TexPoint%manual%before%you%delete%this%box.:%AAAAAAA%At'Pi3sburgh'G720'summit'…'2%Linear'classiﬁers'–'which'line'is'be3er?'3%Pick'the'one'with'the'largest'margin!'4%Parameterizing'the'decision'boundary'5%w.x%+%b%=%0%w.x%+%b%>%0%w.x%+%b%<%0%Labels:'Maximizing'the'margin'6%margin%=%γ =%a/ǁwǁ"w.x%+%b%=%0%w.x%+%b%>%0%w.x%+%b%<%0%w.x%+%b%=%a%w.x%+%b%=%1a%γ"γ"Distance%of%closest%examples%%from%the%line/hyperplane%Maximizing'the'margin'7%w.x%+%b%=%0%w.x%+%b%>%0%w.x%+%b%<%0%w.x%+%b%=%a%w.x%+%b%=%1a%γ"γ"%%%max%%γ =%a/ǁwǁ"%w,b%%s.t.%(w.xj+b)%yj%≥%a ∀j%%margin%=%γ =%a/ǁwǁ"Note:%%‘a’%is%arbitrary%(can%normalize%%%%%%%%%%%%%%equa$ons%by%a)%Distance%of%closest%examples%%from%the%line/hyperplane%Support'Vector'Machines'8%w.x%+%b%>%0%w.x%+%b%<%0%γ"γ"%%%min%%w.w"%w,b%%s.t.%(w.xj+b)%yj%≥%1 ∀j%%%%%%%Solve%eﬃciently%by%quadra$c%programming%(QP)%– Well1studied%solu$on%algorithms%%w.x%+%b%=%0%w.x%+%b%=%1%w.x%+%b%=%11%%%%max%%γ =%1/ǁwǁ"%w,b%%s.t.%(w.xj+b)%yj%≥%1 ∀j%%Support'Vectors'9%w.x%+%b%>%0%w.x%+%b%<%0%γ"γ"Linear%hyperplane%deﬁned%by%“support%vectors”%%Moving%other%points%a%lifle%doesn’t%eﬀect%the%decision%boundary%%%only%need%to%store%the%support%vectors%to%predict%labels%of%new%points%%How%many%support%vectors%in%linearly%separable%case,%given%d%dimensions?%% %%%%What'if'data'is'not'linearly'separable?'10%Use'features'of'fe atur e s''of'features'of ' features….'But%run%risk%of%overﬁlng!%x12,%x22,%x1x2,%….,%exp(x1)%11%%%%min%%w.w%+%C%#mistakes%"%w,b%%s.t.%(w.xj+b)%yj%≥%1% ∀j%Allow%“error”%in%classiﬁca$on%Maximize%margin%and%minimize%%#%mistakes%on%training%data%C%%1%%tradeoﬀ%parameter%Not%QP%L%0/1%loss%(doesn’t%dis$nguish%between%near%miss%and%bad%mistake)%What'if'data'is'not'linearly'separable?'12%%%%min%%w.w%+%C%Σ%ξj%"%w,b,ξ%%s.t.%(w.xj+b)%yj%≥%11ξj% ∀j%%% %ξj%≥%0 ∀j%j%Allow%“error”%in%classiﬁca$on%ξj%%%%1%“slack”%variables%%%%%%%%%%%%%=%(>1%if%xj%misclassifed)%%pay%linear%penalty%if%mistake%C%%1%%tradeoﬀ%parameter%(chosen%by%%%%%%%%%%cross1valida$on)%S$ll%QP%J%Soft margin approach What'if'data'is'not'linearly'separable?'SoM7margin'SVM'13%(w.xj+b)%yj%≥%11ξj% ∀j%%%%%%ξj%≥%0 ∀j%Sosen%the%constraints:%Penalty%for%misclassifying:%C%ξj%%How do we recover hard margin SVM? %Set%C%=%∞%w.x%+%b%=%1%w.x%+%b%=%11%SoM7margin'SVM'14%(w.xj+b)%yj%≥%11ξj% ∀j%%%%%%ξj%≥%0 ∀j%Sosen%the%constraints:%Penalty%for%misclassifying:%C%ξj%%w.x%+%b%=%1%w.x%+%b%=%11%a+= max(a, 0)Support'Vectors'15%(w.xj+b)%yj%≥%11ξj% ∀j%%%%%%ξj%≥%0 ∀j%Sosen%the%constraints:%Penalty%for%misclassifying:%C%ξj%%w.x%+%b%=%1%w.x%+%b%=%11%a+= max(a, 0)Slack'variables'–'Hinge'loss'16%%%%min%%w.w%+%C%Σ%ξj%"%w,b,ξ%%s.t.%(w.xj+b)%yj%≥%11ξj% ∀j%%% %ξj%≥%0 ∀j%j%Regularized%loss%func$on%Hinge'loss'071'loss'0'71'1'loss RegularizationSVM'vs.'LogisQc'Regression'17%SVM%:'Hinge'loss'071'loss'0'71'1'Logis$c%Regression%:%Log'loss'%(%1ve%log%condi$onal%likelihood)'Hinge'loss'Log'loss'What'about'mulQple'classes?'18%One'against'rest'19%Learn'3'classiﬁers''separately:''Class%k%vs.%rest%(wk,%bk)k=1,2,3%y%=%arg%max%wk.x%+%bk%k%But%wks%may%not%be%%based%on%the%same%scale.%Note:%(aw).x%+%(ab)%is%also%a%solu$on%Learn'1'classiﬁer:'MulQ7class'SVM'20%Simultaneously'learn'3'sets'of'weights'y%=%arg%max%w(k).x%+%b(k)%Margin%1%gap%between%correct%class%and%nearest%other%class%Learn'1'classiﬁer:'MulQ7class'SVM'21%Simultaneously'learn'3'sets'of'weights'y%=%arg%max%w(k).x%+%b(k)%Joint%op$miza$on:%wks%have%the%same%scale.%What'you'need'to'know'• Maximizing%margin%• Deriva$on%of%SVM%formula$on%• Slack%variables%and%hinge%loss%• Rela$onship%between%SVMs%and%logis$c%regression%– 0/1%loss%– Hinge%loss%– Log%loss%• Tackling%mul$ple%class%– One%against%All%– Mul$class%SVMs%22%SVMs'reminder'23%%%%min%%w.w%+%C%Σ%ξj%"%w,b,ξ%%s.t.%(w.xj+b)%yj%≥%11ξj% ∀j%%% %ξj%≥%0 ∀j%Hinge loss Regularization Soft margin approach Essen$ally%a%constrained%op$miza$on%problem!%Constrained'OpQmizaQon'24%Constraint%is%ineﬀec$ve% Constraint%is%eﬀec$ve%Constrained'OpQmizaQon'25%Moving'the'constraint'to'objecQve'funcQon'Lagrangian:'Dual'problem:'α '='0'constraint'is'ineﬀecQve'α '>'0''constraint'is'eﬀecQve'b'+ve'Primal'problem:'Dual'SVM'–'linearly'separable'case'• Primal%problem:%%• Lagrangian:%%%26%w – weights on features α – weights on training pts α j'='0'constraint'is'ineﬀecQve'''(w.xj+b)yj%>%1%%(not%a%support%vector)%α j'>'0''constraint'is'eﬀecQve'''''(w.xj+b)yj%=%1%%(point%j%is%a%support%vector)%Dual'SVM'–'linearly'separable'case'• Dual%problem:%%%27%If%we%can%solve%for%αs%(dual%problem),%then%we%have%a%solu$on%for%w,b%(primal%problem)%%Dual'SVM'–'linearly'separable'case'• Dual%problem:%%%28%Dual'SVM'–'linearly'separable'case'%Dual%problem%is%also%QP%Solu$on%gives%αjs%29%Use%support%vectors%to%compute%b%w.xk+b%=%yk%%%%%%%%%%%(w.xk+b)yk%=%1%Dual'SVM'InterpretaQon:'Sparsity'30%w.x%+%b%=%0%Only%few%αjs%can%be%non1zero%:%where%constraint%is%$ght%%%%%(w.xj%+%b)yj%=%1%%Support'vectors%–%training%points%j%whose%αjs%are%non1zero%αj'>'0'αj'>'0'αj'>'0'αj'='0'αj'='0'αj'='0'Dual'SVM'–'non7separable'case'31%• Primal%problem:%• Dual%problem:%%%Lagrange''MulQpliers'Dual'SVM'–'non7separable'case'32%%%%Dual%problem%is%also%QP%Solu$on%gives%αjs%comes%from%Intuition:%%Earlier%1%If%constraint%violated,%αi%→∞%Now%1%If%constraint%violated,%αi%≤%C%So'why'solve'the'dual'SVM?'• There%are%some%quadra$c%programming%algorithms%that%can%solve%the%dual%faster%than%the%primal,%specially%in%high%dimensions%m>>n%•

View Full Document