UB CSE 574 - Support Vector Machines - D663305

Home> Schools> University at Buffalo, The State University of New York> Computer Science & Engineering (CSE) > CSE 574> Support Vector Machines

DOC PREVIEW

UB CSE 574 - Support Vector Machines

School name University at Buffalo, The State University of New York

Course Cse 574- Introduction to Machine Learning

Pages 42

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Support Vector MachinesSVM Discussion Overview1. Importance of SVM2. Mathematical Techniques Used3. Support Vectors and MarginMargin MaximizationDistance from arbitrary point and planeChoosing a marginSVM Margin geometryStatement of Optimization Problem4. SVM Training MethodologyJoseph-Louis Lagrange 1736-1813SVM Training: Optimization ProblemOptimization of Lagrange functionDual Optimization ProblemSolution of Dual ProblemKernel Function: key propertyA Polynomial Kernel FunctionNon-Linear CaseMapping into Higher Dimensional Feature SpacePattern Transformation using KernelsExample of SVM resultsDemoSVM for the XOR problemSVM for XOR: maximization problemSVM for XOR: maximization problem5. Overlapping Class Distributionsn-SVM applied to non-separable data6. Multiclass SVMs (one-versus rest)Multiclass SVMs (one-versus one)7. SVM and Computational Learning TheoryAll dichotomies of 3 points in 2 dimensions are linearly separableVC Dimension of Hyperplanes in R2Fraction of Dichotomies that are linearly separableCapacity of a line when d=2Possible method of training SVMSupport vectors are worst classified samplesGeneralization Error of SVM8. Relevance Vector MachinesMachine Learning SrihariSupport Vector MachinesMachine Learning SrihariSVM Discussion Overview1. Importance of SVMs2. Overview of Mathematical Techniques Employed3. Margin Geometry4. SVM Training Methodology5. Overlapping Distributions6. Dealing with Multiple Classes7. SVM and Computational Learning Theory8. Relevance Vector MachinesMachine Learning Srihari1. Importance of SVM• SVM is a discriminative method that brings together:1. computational learning theory 2. previously known methods in linear discriminant functions 3. optimization theory • Also called Sparse kernel machines • Kernel methods predict based on linear combinations of a kernel function evaluated at the training points, e.g., ParzenWindow• Sparse because not all pairs of training points need be used• Also called Maximum margin classifiers • Widely used for solving problems in classification, regression and novelty detectionMachine Learning Srihari2. Mathematical Techniques Used1. Linearly separable case considered since appropriate nonlinear mapping f to a high dimension two categories are always separable by a hyperplane2. To handle non-linear separability• Preprocessing data to represent in much higher-dimensional space than original feature space• Kernel trick reduces computational overhead)()(),(ktjktjkjxxyyyykφφ⋅=⋅=Machine Learning Srihari3. Support Vectors and Margin• Support vectors are those nearest patterns at distance b from hyperplane• SVM finds hyperplanewith maximum distance(margin distance b)from nearest training patternsThree support vectors are shown as solid dotsMachine Learning SrihariMargin Maximization• Why maximize margin?• Motivation found in computational learning theory or statistical learning theory (PAC learning-VC dimension)• Insight given as follows (Tong, Koller 2000):• Model distributions for each class using Parzen density estimators using Gaussian kernels with common parameterσ2• Instead of optimum boundary, determine best hyperplanerelative to learned density model• As σ2Æ 0 optimum hyperplane has maximum margin• Hyperplane becomes independent of data points that are not support vectorsMachine Learning SrihariDistance from arbitrary point and plane• Hyperplane: • Lemma:: Distance from x to the plane is Proof: Let where is the distance from x to the plane, then||||wwrxxp+=||||)(wxgr=px0)(<yg||||wwrxxp+=0)||||()( wwwrxwxgpt++=||||0wwwrwxwtpt++=||||||||)(2wwrxgp+=||||wr=r||||)(wxgr =0)(0=+•= wxwxgt0)( >xgCorollary: Distance of origin to plane isr = g(0)/||w|| = w0/||w||sinceThus w0=0 implies thatplane passes through origin000)0( wwwgt=+== 0QED0)( wxwxgt+•=bias is andctor weight ve theis where0wwMachine Learning SrihariChoosing a margin• Augmented space: g(y) = aty by choosing a0= w0and y0=1, i.e, plane passes through origin• For each of the patterns, let zk= +1 depending on whether pattern k is in class ω1orω2• Thus if g(y)=0 is a separating hyperplane thenzkg(yk) > 0, k = 1,.., n• Since distance of a point y to hyperplane g(y)=0 iswe could require that hyperplane be such that all points are at least distant b from it, i.e.,bygzkk≥||a||)(||a||)(ygbyg(y)= atyMachine Learning SrihariSVM Margin geometryg(y) = 1Shortest line connectingConvex hulls, whichhas length = 2/||a||Optimal hyperplane is orthogonal to..y1y2g(y) = -1g(y) = aty= 0y2y1Machine Learning SrihariStatement of Optimization Problem• The goal is to find the weight vector a that satisfies• while maximizing b• To ensure uniqueness we impose the constraintb ||a|| = 1or b = 1/||a||• Which implies that we also require that ||a||2be minimized• Support vectors are (transformed) training patterns which represent equality in above equation• Called a quadratic optimization problem since we are trying to minimize a quadratic function subject to a set of linear inequality constraints()nk baygzkk,.....1, =≥Machine Learning Srihari4. SVM Training Methodology1. Training is formulated as an optimization problem• Dual problem is stated to reduce computational complexity• Kernel trick is used to reduce computation2. Determination of the model parameters corresponds to a convex optimization problem• Solution is straightforward (local solution is a global optimum)3. Makes use of Lagrange multipliersMachine Learning SrihariJoseph-Louis Lagrange 1736-1813• French Mathematician• Born in Turin, Italy• Succeeded Euler at Berlin academy• Narrowly escaped execution in French revolution due to Lovoisier who himself was guillotined• Made key contributions to calculus and dynamicsMachine Learning SrihariSVM Training: Optimization Problemnkyzsubjectoptimizektk,.....1 ,1asconstraint to||a||21min arg2ba,=≥• Can be cast as an unconstrained problem by introducing Lagrange undetermined multiplierswith one multiplier for each constraint• The Lagrange function is()[]∑=−−=nkktkkyaaaL121z21,αααkMachine Learning SrihariOptimization of Lagrange function• The Lagrange function is• We seek to minimize L ( ) • with respect to

View Full Document