Two SVM tutorials linked in class website please read both High level presentation with applications Hearst 1998 Detailed tutorial Burges 1998 Support Vector Machines Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University February 16th 2005 Linear classifiers Which line is better Data Example i w x j w j x j w x b 0 Pick the one with the largest margin w x j w j x j w x b 0 Maximize the margin w x b 0 But there are a many planes w x b 0 Review Normal to a plane margin 1 0 w x b x w x b 1 w x b Normalized margin Canonical hyperplanes x 1 w x b 0 w x b w x b 1 Margin maximization using canonical hyperplanes margin 1 w x b 0 w x b w x b 1 Support vector machines SVMs Solve efficiently by quadratic programming QP Well studied solution algorithms margin Hyperplane defined by support vectors What if the data is not linearly separable Use features of features of features of features What if the data is still not linearly separable Minimize w w and number of training mistakes Tradeoff two criteria Tradeoff mistakes and w w 0 1 loss Slack penalty C Not QP anymore Also doesn t distinguish near misses and really bad mistakes Slack variables Hinge loss If margin 1 don t care If margin 1 pay linear penalty Side note What s the difference between SVMs and logistic regression SVM Logistic regression Log loss What about multiple classes One against All Learn 3 classifiers Learn 1 classifier Multiclass SVM Simultaneously learn 3 sets of weights Learn 1 classifier Multiclass SVM What you need to know Maximizing margin Derivation of SVM formulation Slack variables and hinge loss Relationship between SVMs and logistic regression 0 1 loss Hinge loss Log loss Tackling multiple class One against All Multiclass SVMs Acknowledgment SVM applet http www site uottawa ca gcaron applets htm
View Full Document