Support Vector Machines Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University February 21st 2007 2005 2007 Carlos Guestrin 1 Linear classifiers Which line is better Data Example i w x j w j x j 2005 2007 Carlos Guestrin 2 w x b 0 Pick the one with the largest margin w x j w j x j 2005 2007 Carlos Guestrin 3 w x b 0 Maximize the margin 2005 2007 Carlos Guestrin 4 w x b 0 But there are a many planes 2005 2007 Carlos Guestrin 5 w x b 0 Review Normal to a plane 2005 2007 Carlos Guestrin 6 margin 2 1 0 w x b x w x b 1 w x b Normalized margin Canonical hyperplanes x 2005 2007 Carlos Guestrin 7 margin 2 1 0 w x b x w x b 1 w x b Normalized margin Canonical hyperplanes x 2005 2007 Carlos Guestrin 8 1 w x b 0 w x b w x b 1 Margin maximization using canonical hyperplanes margin 2 2005 2007 Carlos Guestrin 9 1 w x b 0 w x b w x b 1 Support vector machines SVMs Solve efficiently by quadratic programming QP Well studied solution algorithms Hyperplane defined by support vectors margin 2 2005 2007 Carlos Guestrin 10 Announcements Third homework out later today This one is shorter Due on Monday March 5th No late days allowed so we can give solutions before midterm 2005 2007 Carlos Guestrin 11 What if the data is not linearly separable Use features of features of features of features 2005 2007 Carlos Guestrin 12 What if the data is still not linearly separable Minimize w w and number of training mistakes Tradeoff two criteria Tradeoff mistakes and w w 0 1 loss Slack penalty C Not QP anymore Also doesn t distinguish near misses and really bad mistakes 2005 2007 Carlos Guestrin 13 Slack variables Hinge loss If margin 1 don t care If margin 1 pay linear penalty 2005 2007 Carlos Guestrin 14 Side note What s the difference between SVMs and logistic regression SVM Logistic regression Log loss 2005 2007 Carlos Guestrin 15 What about multiple classes 2005 2007 Carlos Guestrin 16 One against All Learn 3 classifiers 2005 2007 Carlos Guestrin 17 Learn 1 classifier Multiclass SVM Simultaneously learn 3 sets of weights 2005 2007 Carlos Guestrin 18 Learn 1 classifier Multiclass SVM 2005 2007 Carlos Guestrin 19 What you need to know Maximizing margin Derivation of SVM formulation Slack variables and hinge loss Relationship between SVMs and logistic regression 0 1 loss Hinge loss Log loss Tackling multiple class One against All Multiclass SVMs 2005 2007 Carlos Guestrin 20
View Full Document