©2005-2007 Carlos Guestrin1Support Vector MachinesMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityFebruary 21st, 2007©2005-2007 Carlos Guestrin2Linear classifiers – Which line is better?Data:Example i:w.x = ∑jw(j)x(j)©2005-2007 Carlos Guestrin3Pick the one with the largest margin!w.x = ∑jw(j)x(j)w.x+ b = 0©2005-2007 Carlos Guestrin4Maximize the marginw.x+ b = 0©2005-2007 Carlos Guestrin5But there are a many planes…w.x+ b = 0©2005-2007 Carlos Guestrin6w.x+ b = 0Review: Normal to a plane©2005-2007 Carlos Guestrin7Normalized margin – Canonical hyperplanesw.x+ b = +1w.x+ b = -1w.x+ b = 0margin 2γx-x+©2005-2007 Carlos Guestrin8Normalized margin – Canonical hyperplanesw.x+ b = +1w.x+ b = -1w.x+ b = 0margin 2γx-x+©2005-2007 Carlos Guestrin9Margin maximization using canonical hyperplanesw.x+ b = +1w.x+ b = -1w.x+ b = 0margin 2γ©2005-2007 Carlos Guestrin10Support vector machines (SVMs)w.x+ b = +1w.x+ b = -1w.x+ b = 0margin 2γ Solve efficiently by quadratic programming (QP) Well-studied solution algorithms Hyperplane defined by support vectors©2005-2007 Carlos Guestrin11Announcements Third homework out later today This one is shorter!!!! :) Due on Monday March 5th No late days allowed so we can give solutions before midterm©2005-2007 Carlos Guestrin12What if the data is not linearly separable?Use features of features of features of features….©2005-2007 Carlos Guestrin13What if the data is still not linearly separable? Minimize w.w and number of training mistakes Tradeoff two criteria? Tradeoff #(mistakes) and w.w 0/1 loss Slack penalty C Not QP anymore Also doesn’t distinguish near misses and really bad mistakes©2005-2007 Carlos Guestrin14Slack variables – Hinge loss If margin ≥ 1, don’t care If margin < 1, pay linear penalty©2005-2007 Carlos Guestrin15Side note: What’s the difference between SVMs and logistic regression?SVM:Logistic regression:Log loss:©2005-2007 Carlos Guestrin16What about multiple classes?©2005-2007 Carlos Guestrin17One against AllLearn 3 classifiers:©2005-2007 Carlos Guestrin18Learn 1 classifier: Multiclass SVMSimultaneously learn 3 sets of weights©2005-2007 Carlos Guestrin19Learn 1 classifier: Multiclass SVM©2005-2007 Carlos Guestrin20What you need to know Maximizing margin Derivation of SVM formulation Slack variables and hinge loss Relationship between SVMs and logistic regression 0/1 loss Hinge loss Log loss Tackling multiple class One against All Multiclass
View Full Document