CMU CS 10601 - Support Vector Machine - D2107386

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10601> Support Vector Machine

DOC PREVIEW

CMU CS 10601 - Support Vector Machine

School name Carnegie Mellon University

Course Cs 10601- Introduction to Machine Learning

Pages 48

This preview shows page 1-2-3-23-24-25-26-46-47-48 out of 48 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Support Vector Machine10-601 Machine LearningTypes of classifiers• We can divide the large variety of classification approaches into roughly three major types 1. Instance based classifiers- Use observation directly (no models)- e.g. K nearest neighbors2. Generative:- build a generative statistical model- e.g., Bayesian networks3. Discriminative- directly estimate a decision rule/boundary- e.g., decision treeRanking classifiersRich Caruana & Alexandru Niculescu-Mizil, An Empirical Comparison of Supervised Learning Algorithms, ICML 2006Regression classifiersRecall our regression classifiers+1 if sign(wTx+b)0-1 if sign(wTx+b)<0Regression classifiersRecall our regression classifiersLine closer to the blue nodes since many of them are far away from the boundaryRegression classifiersRecall our regression classifiersminw(yiwTxi)2iLine closer to the blue nodes since many of them are far away from the boundaryGoes over all points x (even in LR settings)Regression classifiersRecall our regression classifiersminw(yiwTxi)2iLine closer to the blue nodes since many of them are far away from the boundaryGoes over all points x (even in LR settings)Many more possible classifiersMax margin classifiers• Instead of fitting all points, focus on boundary points•Learn a boundary that leads to the largest margin from both sets of pointsFrom all the possible boundary lines, this leads to the largest margin on both sidesMax margin classifiers• Instead of fitting all points, focus on boundary points• Learn a boundary that leads to the largest margin from points on both sidesDDWhy? • Intuitive, „makes sense‟• Easy to do cross validation • Some theoretical support• Works well in practiceMax margin classifiers• Instead of fitting all points, focus on boundary points• Learn a boundary that leads to the largest margin from points on both sidesDDAlso known as linear support vector machines (SVMs)These are the vectors supporting the boundarySpecifying a max margin classifierClassify as +1 if wTx+b  1Classify as -1 if wTx+b  - 1Undefined if -1 <wTx+b < 1Class +1 planeboundaryClass -1 planeSpecifying a max margin classifierClassify as +1 if wTx+b  1Classify as -1 if wTx+b  - 1Undefined if -1 <wTx+b < 1Is the linear separation assumption realistic? We will deal with this shortly, but lets assume it for nowMaximizing the marginClassify as +1 if wTx+b  1Classify as -1 if wTx+b  - 1Undefined if -1 <wTx+b < 1• Lets define the width of the margin by M• How can we encode our goal of maximizing M in terms of our parameters (w and b)?• Lets start with a few obsevrationsMaximizing the marginClassify as +1 if wTx+b  1Classify as -1 if wTx+b  - 1Undefined if -1 <wTx+b < 1• Observation 1: the vector w is orthogonal to the +1 plane• Why?Let u and v be two points on the +1 plane, then for the vector defined by u and v we have wT(u-v) = 0Corollary: the vector w is orthogonal to the -1 planeMaximizing the marginClassify as +1 if wTx+b  1Classify as -1 if wTx+b  - 1Undefined if -1 <wTx+b < 1• Observation 1: the vector w is orthogonal to the +1 and -1 planes• Observation 2: if x+is a point on the +1 plane and x-is the closest point to x+on the -1 plane then x+= w + x-Since w is orthogonal to both planes we need to „travel‟ some distance along w to get from x+to x-Putting it together• wTx++ b = +1• wTx-+ b = -1• x+= w + x-• | x+- x-| = MWe can now define M in terms of w and bwTx++ b = +1wT(w + x-) + b = +1wTx-+ b + wTw = +1-1 + wTw = +1 = 2/wTwPutting it together• wTx++ b = +1• wTx-+ b = -1• x+= w + x-• | x+- x-| = M•  = 2/wTwWe can now define M in terms of w and bM = |x+- x-|M |w || w ||wTwM  2wTwwTw2wTwFinding the optimal parametersM 2wTwWe can now search for the optimal parameters by finding a solution that:1. Correctly classifies all points2. Maximizes the margin (or equivalently minimizes wTw)Several optimization methods can be used: Gradient descent, simulated annealing, EM etc.Quadratic programming (QP)Quadratic programming solves optimization problems of the following form:minUuTRu2 dTu  csubject to n inequality constraints:a11u1 a12u2 ...  b1an1u1 an 2u2 ...  bnand k equivalency constraints:an 1,1u1 an 1,2u2 ...  bn 1an k,1u1 an k,2u2 ...  bn kQuadratic termWhen a problem can be specified as a QP problem we can use solvers that are better than gradient descent or simulated annealingSVM as a QP problemminUuTRu2 dTu  csubject to n inequality constraints:a11u1 a12u2 ...  b1an1u1 an 2u2 ...  bnand k equivalency constraints:an 1,1u1 an 1,2u2 ...  bn 1an k,1u1 an k,2u2 ...  bn kM 2wTwMin (wTw)/2 subject to the following inequality constraints:For all x in class + 1wTx+b  1For all x in class - 1wTx+b  -1}A total of n constraints if we have n input samplesNon linearly separable case• So far we assumed that a linear plane can perfectly separate the points• But this is not usally the case- noise, outliersHow can we convert this to a QP problem?- Minimize training errors?min wTwmin #errors- Penalize training errors:min wTw+C*(#errors)Hard to solve (two minimization problems)Hard to encode in a QP problemNon linearly separable case• Instead of minimizing the number of misclassified points we can minimize the distance between these points and their correct plane-1 plane+1 planejkThe new optimization problem is:subject to the following inequality constraints:For all xiin class + 1wTx+b  1- iFor all xiin class - 1wTx+b  -1+ iminwwTw2+ Cii=1nWait. Are we missing something?Final optimization for non linearly separable case-1 plane+1 planejkThe new optimization problem is:subject to the following inequality constraints:For all xiin class + 1wTx+b  1- iFor all xiin class - 1wTx+b  -1+ iminwwTw2+ Cii=1nFor all iI  0}A total of n constraints}Another n constraintsWhere we areTwo optimization problems: For the separable and non separable casesFor all x in class + 1wTx+b  1For all x in class - 1wTx+b  -1For all xiin class + 1wTx+b  1- iFor all

View Full Document