MSU CSE 802 - A Simple Introduction to Support Vector Machines - D2189525

Home> Schools> Michigan State University> Computer Science & Engineering (CSE) > CSE 802> A Simple Introduction to Support Vector Machines

DOC PREVIEW

MSU CSE 802 - A Simple Introduction to Support Vector Machines

School name Michigan State University

Course Cse 802- Pattern Recognition and Analysis

Pages 48

This preview shows page 1-2-3-23-24-25-26-46-47-48 out of 48 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

A Simple Introduction to Support Vector MachinesOutlineHistory of SVMWhat is a good Decision Boundary?Examples of Bad Decision BoundariesLarge-margin Decision BoundaryFinding the Decision BoundaryRecap of Constrained OptimizationSlide 9Back to the Original ProblemThe Dual ProblemSlide 12Slide 13Characteristics of the SolutionThe Quadratic Programming ProblemA Geometrical InterpretationNon-linearly Separable ProblemsSoft Margin HyperplaneThe Optimization ProblemExtension to Non-linear Decision BoundaryTransforming the Data (c.f. DHS Ch. 5)The Kernel TrickAn Example for f(.) and K(.,.)Kernel FunctionsExamples of Kernel FunctionsModification Due to Kernel FunctionSlide 27More on Kernel FunctionsSlide 29ExampleSlide 31Slide 32Why SVM Work?Why SVM works?VC-dimensionSlide 36Structural Risk Minimization (SRM)Slide 38Slide 39Justification of SVMChoosing the Kernel FunctionOther Aspects of SVMSoftwareSummary: Steps for ClassificationStrengths and Weaknesses of SVMOther Types of Kernel MethodsConclusionResourcesA Simple Introduction to Support Vector MachinesMartin LawLecture for CSE 802Department of Computer Science and EngineeringMichigan State University01/14/19 CSE 802. Prepared by Martin Law2OutlineA brief history of SVMLarge-margin linear classifierLinear separableNonlinear separableCreating nonlinear classifiers: kernel trickA simple exampleDiscussion on SVMConclusion01/14/19 CSE 802. Prepared by Martin Law3History of SVMSVM is related to statistical learning theory [3]SVM was first introduced in 1992 [1] SVM becomes popular because of its success in handwritten digit recognition 1.1% test error rate for SVM. This is the same as the error rates of a carefully constructed neural network, LeNet 4.See Section 5.11 in [2] or the discussion in [3] for detailsSVM is now regarded as an important example of “kernel methods”, one of the key area in machine learningNote: the meaning of “kernel” is different from the “kernel” function for Parzen windows[1] B.E. Boser et al. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier methods: a case study in handwritten digit recognition. Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 2, pp. 77-82.[3] V. Vapnik. The Nature of Statistical Learning Theory. 2nd edition, Springer, 1999.01/14/19 CSE 802. Prepared by Martin Law4What is a good Decision Boundary?Consider a two-class, linearly separable classification problemMany decision boundaries!The Perceptron algorithm can be used to find such a boundaryDifferent algorithms have been proposed (DHS ch. 5)Are all decision boundaries equally good?Class 1Class 201/14/19 CSE 802. Prepared by Martin Law5Examples of Bad Decision BoundariesClass 1Class 2Class 1Class 201/14/19 CSE 802. Prepared by Martin Law6Large-margin Decision BoundaryThe decision boundary should be as far away from the data of both classes as possibleWe should maximize the margin, mDistance between the origin and the line wtx=k is k/||w||Class 1Class 2m01/14/19 CSE 802. Prepared by Martin Law7Finding the Decision BoundaryLet {x1, ..., xn} be our data set and let yi  {1,-1} be the class label of xiThe decision boundary should classify all points correctly The decision boundary can be found by solving the following constrained optimization problemThis is a constrained optimization problem. Solving it requires some new toolsFeel free to ignore the following several slides; what is important is the constrained optimization problem above01/14/19 CSE 802. Prepared by Martin Law8Recap of Constrained OptimizationSuppose we want to: minimize f(x) subject to g(x) = 0A necessary condition for x0 to be a solution: : the Lagrange multiplierFor multiple constraints gi(x) = 0, i=1, …, m, we need a Lagrange multiplier i for each of the constraints01/14/19 CSE 802. Prepared by Martin Law9Recap of Constrained OptimizationThe case for inequality constraint gi(x)0 is similar, except that the Lagrange multiplier i should be positiveIf x0 is a solution to the constrained optimization problemThere must exist i0 for i=1, …, m such that x0 satisfyThe function is also known as the Lagrangrian; we want to set its gradient to 001/14/19 CSE 802. Prepared by Martin Law10Back to the Original ProblemThe Lagrangian isNote that ||w||2 = wTw Setting the gradient of w.r.t. w and b to zero, we have01/14/19 CSE 802. Prepared by Martin Law11The Dual ProblemIf we substitute to , we have Note that This is a function of i only01/14/19 CSE 802. Prepared by Martin Law12The Dual ProblemThe new objective function is in terms of i onlyIt is known as the dual problem: if we know w, we know all i; if we know all i, we know wThe original problem is known as the primal problemThe objective function of the dual problem needs to be maximized!The dual problem is therefore:Properties of i when we introduce the Lagrange multipliersThe result when we differentiate the original Lagrangian w.r.t. b01/14/19 CSE 802. Prepared by Martin Law13The Dual ProblemThis is a quadratic programming (QP) problemA global maximum of i can always be foundw can be recovered by01/14/19 CSE 802. Prepared by Martin Law14Characteristics of the SolutionMany of the i are zerow is a linear combination of a small number of data pointsThis “sparse” representation can be viewed as data compression as in the construction of knn classifierxi with non-zero i are called support vectors (SV)The decision boundary is determined only by the SVLet tj (j=1, ..., s) be the indices of the s support vectors. We can writeFor testing with a new data zCompute and classify z as class 1 if the sum is positive, and class 2 otherwiseNote: w need not be formed explicitly01/14/19 CSE 802. Prepared by Martin Law15The Quadratic Programming ProblemMany approaches have been proposedLoqo, cplex, etc. (see http://www.numerical.rl.ac.uk/qp/qp.html)Most are “interior-point” methodsStart with an initial solution that can violate the constraintsImprove this solution by optimizing the objective function

View Full Document