DOC PREVIEW
CMU CS 10601 - Support Vector Machine

This preview shows page 1-2-3-23-24-25-26-46-47-48 out of 48 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Support Vector Machine10-601 Machine LearningTypes of classifiers• We can divide the large variety of classification approaches into roughly three major types 1. Instance based classifiers- Use observation directly (no models)- e.g. K nearest neighbors2. Generative:- build a generative statistical model- e.g., Bayesian networks3. Discriminative- directly estimate a decision rule/boundary- e.g., decision treeRanking classifiersRich Caruana & Alexandru Niculescu-Mizil, An Empirical Comparison of Supervised Learning Algorithms, ICML 2006Regression classifiersRecall our regression classifiers+1 if sign(wTx+b)0-1 if sign(wTx+b)<0Regression classifiersRecall our regression classifiersLine closer to the blue nodes since many of them are far away from the boundaryRegression classifiersRecall our regression classifiersminw(yiwTxi)2iLine closer to the blue nodes since many of them are far away from the boundaryGoes over all points x (even in LR settings)Regression classifiersRecall our regression classifiersminw(yiwTxi)2iLine closer to the blue nodes since many of them are far away from the boundaryGoes over all points x (even in LR settings)Many more possible classifiersMax margin classifiers• Instead of fitting all points, focus on boundary points•Learn a boundary that leads to the largest margin from both sets of pointsFrom all the possible boundary lines, this leads to the largest margin on both sidesMax margin classifiers• Instead of fitting all points, focus on boundary points• Learn a boundary that leads to the largest margin from points on both sidesDDWhy? • Intuitive, „makes sense‟• Easy to do cross validation • Some theoretical support• Works well in practiceMax margin classifiers• Instead of fitting all points, focus on boundary points• Learn a boundary that leads to the largest margin from points on both sidesDDAlso known as linear support vector machines (SVMs)These are the vectors supporting the boundarySpecifying a max margin classifierClassify as +1 if wTx+b  1Classify as -1 if wTx+b  - 1Undefined if -1 <wTx+b < 1Class +1 planeboundaryClass -1 planeSpecifying a max margin classifierClassify as +1 if wTx+b  1Classify as -1 if wTx+b  - 1Undefined if -1 <wTx+b < 1Is the linear separation assumption realistic? We will deal with this shortly, but lets assume it for nowMaximizing the marginClassify as +1 if wTx+b  1Classify as -1 if wTx+b  - 1Undefined if -1 <wTx+b < 1• Lets define the width of the margin by M• How can we encode our goal of maximizing M in terms of our parameters (w and b)?• Lets start with a few obsevrationsMaximizing the marginClassify as +1 if wTx+b  1Classify as -1 if wTx+b  - 1Undefined if -1 <wTx+b < 1• Observation 1: the vector w is orthogonal to the +1 plane• Why?Let u and v be two points on the +1 plane, then for the vector defined by u and v we have wT(u-v) = 0Corollary: the vector w is orthogonal to the -1 planeMaximizing the marginClassify as +1 if wTx+b  1Classify as -1 if wTx+b  - 1Undefined if -1 <wTx+b < 1• Observation 1: the vector w is orthogonal to the +1 and -1 planes• Observation 2: if x+is a point on the +1 plane and x-is the closest point to x+on the -1 plane then x+= w + x-Since w is orthogonal to both planes we need to „travel‟ some distance along w to get from x+to x-Putting it together• wTx++ b = +1• wTx-+ b = -1• x+= w + x-• | x+- x-| = MWe can now define M in terms of w and bwTx++ b = +1wT(w + x-) + b = +1wTx-+ b + wTw = +1-1 + wTw = +1 = 2/wTwPutting it together• wTx++ b = +1• wTx-+ b = -1• x+= w + x-• | x+- x-| = M•  = 2/wTwWe can now define M in terms of w and bM = |x+- x-|M |w || w ||wTwM  2wTwwTw2wTwFinding the optimal parametersM 2wTwWe can now search for the optimal parameters by finding a solution that:1. Correctly classifies all points2. Maximizes the margin (or equivalently minimizes wTw)Several optimization methods can be used: Gradient descent, simulated annealing, EM etc.Quadratic programming (QP)Quadratic programming solves optimization problems of the following form:minUuTRu2 dTu  csubject to n inequality constraints:a11u1 a12u2 ...  b1an1u1 an 2u2 ...  bnand k equivalency constraints:an 1,1u1 an 1,2u2 ...  bn 1an k,1u1 an k,2u2 ...  bn kQuadratic termWhen a problem can be specified as a QP problem we can use solvers that are better than gradient descent or simulated annealingSVM as a QP problemminUuTRu2 dTu  csubject to n inequality constraints:a11u1 a12u2 ...  b1an1u1 an 2u2 ...  bnand k equivalency constraints:an 1,1u1 an 1,2u2 ...  bn 1an k,1u1 an k,2u2 ...  bn kM 2wTwMin (wTw)/2 subject to the following inequality constraints:For all x in class + 1wTx+b  1For all x in class - 1wTx+b  -1}A total of n constraints if we have n input samplesNon linearly separable case• So far we assumed that a linear plane can perfectly separate the points• But this is not usally the case- noise, outliersHow can we convert this to a QP problem?- Minimize training errors?min wTwmin #errors- Penalize training errors:min wTw+C*(#errors)Hard to solve (two minimization problems)Hard to encode in a QP problemNon linearly separable case• Instead of minimizing the number of misclassified points we can minimize the distance between these points and their correct plane-1 plane+1 planejkThe new optimization problem is:subject to the following inequality constraints:For all xiin class + 1wTx+b  1- iFor all xiin class - 1wTx+b  -1+ iminwwTw2+ Cii=1nWait. Are we missing something?Final optimization for non linearly separable case-1 plane+1 planejkThe new optimization problem is:subject to the following inequality constraints:For all xiin class + 1wTx+b  1- iFor all xiin class - 1wTx+b  -1+ iminwwTw2+ Cii=1nFor all iI  0}A total of n constraints}Another n constraintsWhere we areTwo optimization problems: For the separable and non separable casesFor all x in class + 1wTx+b  1For all x in class - 1wTx+b  -1For all xiin class + 1wTx+b  1- iFor all


View Full Document

CMU CS 10601 - Support Vector Machine

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download Support Vector Machine
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Support Vector Machine and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Support Vector Machine 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?