10-601 Recitation William BishopAgenda•Support Vector Machines •BoostingSupport Vector MachinesMaximizing'the'margin'6%margin%=%γ =%a/ǁwǁ"w.x%+%b%=%0%w.x%+%b%>%0%w.x%+%b%<%0%w.x%+%b%=%a%w.x%+%b%=%1a%γ"γ"Distance%of%closest%examples%%from%the%line/hyperplane%Slide from lecture.Would like to express the margin aas a function of w and a.Support Vector MachinesMaximizing'the'margin'6%margin%=%γ =%a/ǁwǁ"w.x%+%b%=%0%w.x%+%b%>%0%w.x%+%b%<%0%w.x%+%b%=%a%w.x%+%b%=%1a%γ"γ"Distance%of%closest%examples%%from%the%line/hyperplane%x’x(x0 x)T✓w||w||◆= (x0Tw xTw)✓1||w||◆= (x0Tw xTw b + b)✓1||w||◆= ([x0Tw + b] [(xTw + b])✓1||w||◆= ([a] [0])✓1||w||◆= a||w||= Support Vector MachinesMaximizing'the'margin'6%margin%=%γ =%a/ǁwǁ"w.x%+%b%=%0%w.x%+%b%>%0%w.x%+%b%<%0%w.x%+%b%=%a%w.x%+%b%=%1a%γ"γ"Distance%of%closest%examples%%from%the%line/hyperplane%x’x(x0 x)T✓w||w||◆= (x0Tw xTw)✓1||w||◆= (x0Tw xTw b + b)✓1||w||◆= ([x0Tw + b] [(xTw + b])✓1||w||◆= ([a] [0])✓1||w||◆= a||w||= Support Vector MachinesMaximizing'the'margin'6%margin%=%γ =%a/ǁwǁ"w.x%+%b%=%0%w.x%+%b%>%0%w.x%+%b%<%0%w.x%+%b%=%a%w.x%+%b%=%1a%γ"γ"Distance%of%closest%examples%%from%the%line/hyperplane%x’x(x0 x)T✓w||w||◆= (x0Tw xTw)✓1||w||◆= (x0Tw xTw b + b)✓1||w||◆= ([x0Tw + b] [(xTw + b])✓1||w||◆= ([a] [0])✓1||w||◆= a||w||= Support Vector MachinesMaximizing'the'margin'6%margin%=%γ =%a/ǁwǁ"w.x%+%b%=%0%w.x%+%b%>%0%w.x%+%b%<%0%w.x%+%b%=%a%w.x%+%b%=%1a%γ"γ"Distance%of%closest%examples%%from%the%line/hyperplane%x’x(x0 x)T✓w||w||◆= (x0Tw xTw)✓1||w||◆= (x0Tw xTw b + b)✓1||w||◆= ([x0Tw + b] [(xTw + b])✓1||w||◆= ([a] [0])✓1||w||◆= a||w||= Support Vector MachinesMaximizing'the'margin'6%margin%=%γ =%a/ǁwǁ"w.x%+%b%=%0%w.x%+%b%>%0%w.x%+%b%<%0%w.x%+%b%=%a%w.x%+%b%=%1a%γ"γ"Distance%of%closest%examples%%from%the%line/hyperplane%x’x(x0 x)T✓w||w||◆= (x0Tw xTw)✓1||w||◆= (x0Tw xTw b + b)✓1||w||◆= ([x0Tw + b] [(xTw + b])✓1||w||◆= ([a] [0])✓1||w||◆= a||w||= Support Vector MachinesMaximizing'the'margin'6%margin%=%γ =%a/ǁwǁ"w.x%+%b%=%0%w.x%+%b%>%0%w.x%+%b%<%0%w.x%+%b%=%a%w.x%+%b%=%1a%γ"γ"Distance%of%closest%examples%%from%the%line/hyperplane%x’x(x0 x)T✓w||w||◆= (x0Tw xTw)✓1||w||◆= (x0Tw xTw b + b)✓1||w||◆= ([x0Tw + b] [(xTw + b])✓1||w||◆= ([a] [0])✓1||w||◆= a||w||= Support Vector MachinesMaximizing'the'margin'7%w.x%+%b%=%0%w.x%+%b%>%0%w.x%+%b%<%0%w.x%+%b%=%a%w.x%+%b%=%1a%γ"γ"%%%max%%γ =%a/ǁwǁ"%w,b%%s.t.%(w.xj+b)%yj%≥%a ∀j%%margin%=%γ =%a/ǁwǁ"Note:%%‘a’%is%arbitrary%(can%normalize%%%%%%%%%%%%%%equa$ons%by%a)%Distance%of%closest%examples%%from%the%line/hyperplane%Slide from lecture.Support Vector MachinesSupport'Vector'Machines'8%w.x%+%b%>%0%w.x%+%b%<%0%γ"γ"%%%min%%w.w"%w,b%%s.t.%(w.xj+b)%yj%≥%1 ∀j%%%%%%%Solve%efficiently%by%quadra$c%programming%(QP)%– Well1studied%solu$on%algorithms%%w.x%+%b%=%0%w.x%+%b%=%1%w.x%+%b%=%11%%%%max%%γ =%1/ǁwǁ"%w,b%%s.t.%(w.xj+b)%yj%≥%1 ∀j%%Slide from lecture.Support Vector MachinesSlide from lecture.12%%%%min%%w.w%+%C%Σ%ξj%"%w,b,ξ%%s.t.%(w.xj+b)%yj%≥%11ξj% ∀j%%% %ξj%≥%0 ∀j%j%Allow%“error”%in%classifica$on%ξj%%%%1%“slack”%variables%%%%%%%%%%%%%=%(>1%if%xj%misclassifed)%%pay%linear%penalty%if%mistake%C%%1%%tradeoff%parameter%(chosen%by%%%%%%%%%%cross1valida$on)%S$ll%QP%"%Soft margin approach What'if'data'is'not'linearly'separable?'Support Vector MachinesThe Primal Problem for the Linearly Separable Case:minw,bwTws.t. (wTxj+ b)yj 1 8jL(w,b,↵j)=wTw Xj↵jwTxj+ b)yj 1Support Vector MachinesThe Primal Problem for the Linearly Separable Case:minw,bwTws.t. (wTxj+ b)yj 1 8jL(w,b,↵j)=wTw Xj↵jwTxj+ b)yj 1Support Vector MachinesThe Primal Problem for the Linearly Separable Case:minw,bwTws.t. (wTxj+ b)yj 1 8jL(w,b,↵j)=wTw Xj↵jwTxj+ b)yj 1Support Vector MachinesL(w,b,↵j)=wTw Xj↵jwTxj+ b)yj 1The Primal Problem:The Dual Problem:minw,bmax↵j[L(w,b,↵j)]s.t.↵j 0 8jmax↵jminw,b[L(w,b,↵j)]s.t.↵j 0 8jSupport Vector MachinesSolving the dual:max↵jminw,b[L(w,b,↵j)]s.t.↵j 0 8j@L@w= w Xj↵jxjyjw =Xj↵jxjyjL(w,b,↵j)=12wTw Xj↵jwTxj+ b)yj 1@L@b=Xj↵jyj0=Xj↵jyjSupport Vector MachinesSolving the dual:max↵jminw,b[L(w,b,↵j)]s.t.↵j 0 8j@L@w= w Xj↵jxjyjw =Xj↵jxjyjL(w,b,↵j)=12wTw Xj↵jwTxj+ b)yj 1@L@b=Xj↵jyj0=Xj↵jyjSupport Vector MachinesSolving the dual:max↵jminw,b[L(w,b,↵j)]s.t.↵j 0 8j@L@w= w Xj↵jxjyjw =Xj↵jxjyjL(w,b,↵j)=12wTw Xj↵jwTxj+ b)yj 1@L@b=Xj↵jyj0=Xj↵jyj12wTw Xj↵jwTxj+ b)yj 1=12wTw Xj↵jwTxjyj bXj↵jyj+Xj↵j=120@Xj↵jxjyj1AT Xi↵ixiyi!Xj↵j Xi↵ixiyi!Txjyj+Xj↵j=Xj↵j12XiXj↵i↵jyiyjxTixjSupport Vector MachinesSolving the dual:max↵jminw,b[L(w,b,↵j)]s.t.↵j 0 8j12wTw Xj↵jwTxj+ b)yj 1=12wTw Xj↵jwTxjyj bXj↵jyj+Xj↵j=120@Xj↵jxjyj1AT Xi↵ixiyi!Xj↵j Xi↵ixiyi!Txjyj+Xj↵j=Xj↵j12XiXj↵i↵jyiyjxTixjSupport Vector MachinesSolving the dual:max↵jminw,b[L(w,b,↵j)]s.t.↵j 0 8j12wTw Xj↵jwTxj+ b)yj 1=12wTw Xj↵jwTxjyj bXj↵jyj+Xj↵j=120@Xj↵jxjyj1AT Xi↵ixiyi!Xj↵j Xi↵ixiyi!Txjyj+Xj↵j=Xj↵j12XiXj↵i↵jyiyjxTixjSupport Vector MachinesSolving the dual:max↵jminw,b[L(w,b,↵j)]s.t.↵j 0 8jSupport Vector MachinesSolving the dual:max↵jminw,b[L(w,b,↵j)]s.t.↵j 0 8j12wTw Xj↵jwTxj+ b)yj 1=12wTw Xj↵jwTxjyj bXj↵jyj+Xj↵j=120@Xj↵jxjyj1AT Xi↵ixiyi!Xj↵j Xi↵ixiyi!Txjyj+Xj↵j=Xj↵j12XiXj↵i↵jyiyjxTixjSupport Vector MachinesSolving the dual:Xj↵j12XiXj↵i↵jyiyjxTixjMaximize this with respect to ↵j.w =Xj↵jxjyjThen:b = yj wTxjfor any j where ↵j> 0Support Vector MachinesSo why work with the dual?Just a dot product!max↵j0@Xj↵j12XiXj↵i↵jyiyjxTixj1As.t.↵j 0Xj↵jyj=0Support Vector Machines40%What'if'data'is'not'linearly'separable?'x%Using%non1linear%features%to%get%linear%separa$on%y%x%x2%From class slides.Support Vector
View Full Document