UI CS 4420 - Lecture Note - D2022513

Home> Schools> University of Iowa> Computer Science (CS) > CS 4420> Lecture Note

UI CS 4420 - Lecture Note

Pages 9

Download Save

Unformatted text preview:

1Support Vector Machines Modified from Prof. Andrew W. Moore’s Lecture Noteswww.cs.cmu.edu/~awmSVM: 2History• SVM is a classifier derived from statistical learning theory by Vapnik and Chervonenkis• SVMs introduced by Boser, Guyon, Vapnik in COLT-92• Now an important and active field of all Machine Learning research.• Special issues of Machine Learning Journal, and Journal of Machine Learning Research.SVM: 3Linear Classifiersf xαyestdenotes +1denotes -1f(x,w,b) = sign(w. x + b)How would you classify this data?e.g. x= (x1, x2), w= (w1, w2), w.x= x1w1+ w2x2sign(w. x + b) = +1iffx1w1+ w2x2–b > 0SVM: 4Linear Classifiersf xαyestdenotes +1denotes -1f(x,w,b) = sign(w. x + b)How would you classify this data?e.g. x= (x1, x2), w= (w1, w2), w.x= x1w1+ w2x2sign(w. x + b) = +1iffx1w1+ w2x2–b > 0SVM: 5Linear Classifiersf xαyestdenotes +1denotes -1f(x,w,b) = sign(w. x + b)How would you classify this data?e.g. x= (x1, x2), w= (w1, w2), w.x= x1w1+ w2x2sign(w. x + b) = +1iffx1w1+ w2x2–b > 0SVM: 6Linear Classifiersf xαyestdenotes +1denotes -1f(x,w,b) = sign(w. x + b)How would you classify this data?e.g. x= (x1, x2), w= (w1, w2), w.x= x1w1+ w2x2sign(w. x + b) = +1iffx1w1+ w2x2–b > 02SVM: 7Linear Classifiersf xαyestdenotes +1denotes -1f(x,w,b) = sign(w. x + b)Any of these would be fine....but which is best?e.g. x= (x1, x2), w= (w1, w2), w.x= x1w1+ w2x2sign(w. x + b) = +1iffx1w1+ w2x2+ b > 0SVM: 8Classifier Marginf xαyestdenotes +1denotes -1f(x,w,b) = sign(w. x + b)Define the marginof a linear classifier as the width that the boundary could be increased by before hitting a datapoint.e.g. x= (x1, x2), w= (w1, w2), w.x= x1w1+ w2x2sign(w. x + b) = +1iffx1w1+ w2x2–b > 0SVM: 9Maximum Marginf xαyestdenotes +1denotes -1f(x,w,b) = sign(w. x + b)The maximum margin linear classifier is the linear classifier with the, um, maximum margin.This is the simplest kind of SVM (Called an LSVM)Linear SVMSVM: 10Maximum Marginf xαyestdenotes +1denotes -1f(x,w,b) = sign(w. x + b)The maximum margin linear classifier is the linear classifier with the maximum margin.This is the simplest kind of SVM (Called an LSVM)Support Vectors are those datapoints that the margin pushes up againstLinear SVMSVM: 11Why Maximum Margin?denotes +1denotes -1f(x,w,b) = sign(w. x + b)The maximum margin linear classifier is the linear classifier with the, um, maximum margin.This is the simplest kind of SVM (Called an LSVM)Support Vectors are those datapoints that the margin pushes up against1. Intuitively this feels safest. 2. Empirically it works very well.3. If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification.4. There’s some theory (using VC dimension) that is related to (but not the same as) the proposition that this is a good thing.SVM: 12Estimate the Margin• What is the distance expression for a point x to a line wx+b= 0?denotes +1denotes -1xwx +b = 02212()diibbdw=⋅+⋅+==∑xw xwxwX –VectorW – Normal Vectorb – Scale Valuew’http://mathworld.wolfram.com/Point-LineDistance2-Dimensional.html3SVM: 13Estimate the Margin• What is the expression for margin, given w and b?denotes +1denotes -1wx +b = 021margin argmin ( ) argmindDDiibdw∈∈=⋅+≡=∑xxxwxMarginSVM: 14Maximize Margindenotes +1denotes -1wx +b = 0,,2,1argmax margin( , , ) = argmaxarg min ( )argmaxargminiibibDidbDiibDdbw∈∈=+⋅=∑wwxwxwxxwMarginSVM: 15Maximize Margindenotes +1denotes -1wx +b = 0MarginWXi+b≥0 iff yi=1WXi+b≤0 iff yi=-1yi(WXi+b) ≥0()2,1argmax arg minsubject to : 0iidbDiiiiibwDy b∈=+⋅∀∈ ⋅+ >∑wxxwxxw≥0SVM: 16Maximize Margindenotes +1denotes -1wx +b = 0MarginStrategy:: 1iiDb∀∈+⋅≥xxw()2,1argmaxargminsubject to : 0iidbDiiiiibwDy b∈=+⋅∀∈ ⋅+ ≥∑wxxwxxw()21,argminsubject to : 1diibiiiwDy b=∀∈ ⋅+ ≥∑wxxwwxi+b≥0 iff yi=1wxi+b≤0iff yi=-1yi(WXi+b) ≥ 0wx +b = 0α(wx +b) = 0 where α≠0SVM: 17Maximize Margin•How does it come ?: 1iiDb∀∈ + ⋅ ≥xxw()2,1argmaxarg minsubject to : 0iidbDiiiiibwDy b∈=+⋅∀∈ ⋅+ ≥∑wxxwxxw()21,argminsubject to : 1diibiiiwDy b=∀∈ ⋅+ ≥∑wxxw∑∑∑====××+=+diidiiidiiiwKwKwxbwwxb121212'1|.|minarg|.|minarg∑∑∑=====+diidiidiiiwwwwxb121212'minarg'1maxarg|.|minargmaxargWe haveThus,SVM: 18Maximum Margin Linear Classifier• How to solve it?()()()** 21,1122{ , }= argmaxsubject to1 1 ....1dkkwbNNwb wywx bywx bywx b=⋅+ ≥⋅+≥⋅+≥∑rrrrrrrr4SVM: 19Learning via Quadratic Programming• QP is a well-studied class of optimization algorithms to maximize a quadratic function of some real-valued variables subject to linear constraints.• Detail solution of Quadratic Programming• Convex Optimization Stephen P. Boyd• Online Edition, Free for Downloadingwww.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdfSVM: 20Quadratic Programming2maxarguuuduRcTT++Findnmnmnnmmmmbuauauabuauauabuauaua≤+++≤+++≤+++...:......22112222212111212111nadditional linear inequality constraintsQuadratic criterionSubject toSVM: 21Quadratic Programming for the Linear Classifier()** 2,{, }= minsubject to 1 for all training data ( , )iiwbii iiwb wywx b xy⋅+ ≥∑rrrr r{}()()()**,1122{ , }= argmax 0 01 1 inequality constraints....1 TwbNNwb ww wywx bywx bywx b+⋅ −⎫⋅+ ≥⎪⋅+≥⎪⎬⎪⎪⋅+≥⎭nIrrrrrrrrrrrrSVM: 22• Popular Tools - LibSVMSVM: 23Uh-oh!denotes +1denotes -1This is going to be a problem!What should we do?SVM: 24Uh-oh!denotes +1denotes -1This is going to be a problem!What should we do?Idea 1:Find minimum w.w, while minimizing number of training set errors.Problem: Two things to minimize makes for an ill-defined optimization5SVM: 25Uh-oh!denotes +1denotes -1This is going to be a problem!What should we do?Idea 1.1:Minimizew.w + C (#train errors)There’s a serious practical problem that’s about to make us reject this approach. Can you guess what it is?Tradeoff parameterSVM: 26Uh-oh!denotes +1denotes -1This is going to be a problem!What should we do?Idea 1.1:Minimizew.w + C (#train errors)There’s a serious practical problem that’s about to make us reject this approach. Can you guess what it is?Tradeoff parameterCan’t be expressed as a Quadratic Programming problem.Solving it may be too slow.(Also, doesn’t distinguish between disastrous errors and near misses)So…any other ideas?SVM: 27Uh-oh!denotes +1denotes -1This is going to be a problem!What should we do?Idea 2.0:Minimizew.w + C

View Full Document


School:
Email:
New Password:
Confirm Password:

UI CS 4420 - Lecture Note

Sign up for free to view:

Please select your school