Support'Vector'Machines'Aar$%Singh%%%Machine%Learning%101601%Nov%17,%2011%TexPoint%fonts%used%in%EMF.%%Read%the%TexPoint%manual%before%you%delete%this%box.:%AAAAAAA%At'Pi3sburgh'G720'summit'…'2%Linear'classifiers'–'which'line'is'be3er?'3%Pick'the'one'with'the'largest'margin!'4%Parameterizing'the'decision'boundary'5%w.x%+%b%=%0%w.x%+%b%>%0%w.x%+%b%<%0%Labels:'Maximizing'the'margin'6%margin%=%γ =%a/ǁwǁ"w.x%+%b%=%0%w.x%+%b%>%0%w.x%+%b%<%0%w.x%+%b%=%a%w.x%+%b%=%1a%γ"γ"Distance%of%closest%examples%%from%the%line/hyperplane%Maximizing'the'margin'7%w.x%+%b%=%0%w.x%+%b%>%0%w.x%+%b%<%0%w.x%+%b%=%a%w.x%+%b%=%1a%γ"γ"%%%max%%γ =%a/ǁwǁ"%w,b%%s.t.%(w.xj+b)%yj%≥%a ∀j%%margin%=%γ =%a/ǁwǁ"Note:%%‘a’%is%arbitrary%(can%normalize%%%%%%%%%%%%%%equa$ons%by%a)%Distance%of%closest%examples%%from%the%line/hyperplane%Support'Vector'Machines'8%w.x%+%b%>%0%w.x%+%b%<%0%γ"γ"%%%min%%w.w"%w,b%%s.t.%(w.xj+b)%yj%≥%1 ∀j%%%%%%%Solve%efficiently%by%quadra$c%programming%(QP)%– Well1studied%solu$on%algorithms%%w.x%+%b%=%0%w.x%+%b%=%1%w.x%+%b%=%11%%%%max%%γ =%1/ǁwǁ"%w,b%%s.t.%(w.xj+b)%yj%≥%1 ∀j%%Support'Vectors'9%w.x%+%b%>%0%w.x%+%b%<%0%γ"γ"Linear%hyperplane%defined%by%“support%vectors”%%Moving%other%points%a%lifle%doesn’t%effect%the%decision%boundary%%%only%need%to%store%the%support%vectors%to%predict%labels%of%new%points%%How%many%support%vectors%in%linearly%separable%case,%given%d%dimensions?%% %%%%What'if'data'is'not'linearly'separable?'10%Use'features'of'fe atur e s''of'features'of ' features….'But%run%risk%of%overfilng!%x12,%x22,%x1x2,%….,%exp(x1)%11%%%%min%%w.w%+%C%#mistakes%"%w,b%%s.t.%(w.xj+b)%yj%≥%1% ∀j%Allow%“error”%in%classifica$on%Maximize%margin%and%minimize%%#%mistakes%on%training%data%C%%1%%tradeoff%parameter%Not%QP%L%0/1%loss%(doesn’t%dis$nguish%between%near%miss%and%bad%mistake)%What'if'data'is'not'linearly'separable?'12%%%%min%%w.w%+%C%Σ%ξj%"%w,b,ξ%%s.t.%(w.xj+b)%yj%≥%11ξj% ∀j%%% %ξj%≥%0 ∀j%j%Allow%“error”%in%classifica$on%ξj%%%%1%“slack”%variables%%%%%%%%%%%%%=%(>1%if%xj%misclassifed)%%pay%linear%penalty%if%mistake%C%%1%%tradeoff%parameter%(chosen%by%%%%%%%%%%cross1valida$on)%S$ll%QP%J%Soft margin approach What'if'data'is'not'linearly'separable?'SoM7margin'SVM'13%(w.xj+b)%yj%≥%11ξj% ∀j%%%%%%ξj%≥%0 ∀j%Sosen%the%constraints:%Penalty%for%misclassifying:%C%ξj%%How do we recover hard margin SVM? %Set%C%=%∞%w.x%+%b%=%1%w.x%+%b%=%11%SoM7margin'SVM'14%(w.xj+b)%yj%≥%11ξj% ∀j%%%%%%ξj%≥%0 ∀j%Sosen%the%constraints:%Penalty%for%misclassifying:%C%ξj%%w.x%+%b%=%1%w.x%+%b%=%11%a+= max(a, 0)Support'Vectors'15%(w.xj+b)%yj%≥%11ξj% ∀j%%%%%%ξj%≥%0 ∀j%Sosen%the%constraints:%Penalty%for%misclassifying:%C%ξj%%w.x%+%b%=%1%w.x%+%b%=%11%a+= max(a, 0)Slack'variables'–'Hinge'loss'16%%%%min%%w.w%+%C%Σ%ξj%"%w,b,ξ%%s.t.%(w.xj+b)%yj%≥%11ξj% ∀j%%% %ξj%≥%0 ∀j%j%Regularized%loss%func$on%Hinge'loss'071'loss'0'71'1'loss RegularizationSVM'vs.'LogisQc'Regression'17%SVM%:'Hinge'loss'071'loss'0'71'1'Logis$c%Regression%:%Log'loss'%(%1ve%log%condi$onal%likelihood)'Hinge'loss'Log'loss'What'about'mulQple'classes?'18%One'against'rest'19%Learn'3'classifiers''separately:''Class%k%vs.%rest%(wk,%bk)k=1,2,3%y%=%arg%max%wk.x%+%bk%k%But%wks%may%not%be%%based%on%the%same%scale.%Note:%(aw).x%+%(ab)%is%also%a%solu$on%Learn'1'classifier:'MulQ7class'SVM'20%Simultaneously'learn'3'sets'of'weights'y%=%arg%max%w(k).x%+%b(k)%Margin%1%gap%between%correct%class%and%nearest%other%class%Learn'1'classifier:'MulQ7class'SVM'21%Simultaneously'learn'3'sets'of'weights'y%=%arg%max%w(k).x%+%b(k)%Joint%op$miza$on:%wks%have%the%same%scale.%What'you'need'to'know'• Maximizing%margin%• Deriva$on%of%SVM%formula$on%• Slack%variables%and%hinge%loss%• Rela$onship%between%SVMs%and%logis$c%regression%– 0/1%loss%– Hinge%loss%– Log%loss%• Tackling%mul$ple%class%– One%against%All%– Mul$class%SVMs%22%SVMs'reminder'23%%%%min%%w.w%+%C%Σ%ξj%"%w,b,ξ%%s.t.%(w.xj+b)%yj%≥%11ξj% ∀j%%% %ξj%≥%0 ∀j%Hinge loss Regularization Soft margin approach Essen$ally%a%constrained%op$miza$on%problem!%Constrained'OpQmizaQon'24%Constraint%is%ineffec$ve% Constraint%is%effec$ve%Constrained'OpQmizaQon'25%Moving'the'constraint'to'objecQve'funcQon'Lagrangian:'Dual'problem:'α '='0'constraint'is'ineffecQve'α '>'0''constraint'is'effecQve'b'+ve'Primal'problem:'Dual'SVM'–'linearly'separable'case'• Primal%problem:%%• Lagrangian:%%%26%w – weights on features α – weights on training pts α j'='0'constraint'is'ineffecQve'''(w.xj+b)yj%>%1%%(not%a%support%vector)%α j'>'0''constraint'is'effecQve'''''(w.xj+b)yj%=%1%%(point%j%is%a%support%vector)%Dual'SVM'–'linearly'separable'case'• Dual%problem:%%%27%If%we%can%solve%for%αs%(dual%problem),%then%we%have%a%solu$on%for%w,b%(primal%problem)%%Dual'SVM'–'linearly'separable'case'• Dual%problem:%%%28%Dual'SVM'–'linearly'separable'case'%Dual%problem%is%also%QP%Solu$on%gives%αjs%29%Use%support%vectors%to%compute%b%w.xk+b%=%yk%%%%%%%%%%%(w.xk+b)yk%=%1%Dual'SVM'InterpretaQon:'Sparsity'30%w.x%+%b%=%0%Only%few%αjs%can%be%non1zero%:%where%constraint%is%$ght%%%%%(w.xj%+%b)yj%=%1%%Support'vectors%–%training%points%j%whose%αjs%are%non1zero%αj'>'0'αj'>'0'αj'>'0'αj'='0'αj'='0'αj'='0'Dual'SVM'–'non7separable'case'31%• Primal%problem:%• Dual%problem:%%%Lagrange''MulQpliers'Dual'SVM'–'non7separable'case'32%%%%Dual%problem%is%also%QP%Solu$on%gives%αjs%comes%from%Intuition:%%Earlier%1%If%constraint%violated,%αi%→∞%Now%1%If%constraint%violated,%αi%≤%C%So'why'solve'the'dual'SVM?'• There%are%some%quadra$c%programming%algorithms%that%can%solve%the%dual%faster%than%the%primal,%specially%in%high%dimensions%m>>n%•
View Full Document