DOC PREVIEW
UT Dallas CS 6375 - 09 - svm-new

This preview shows page 1-2-3-18-19-37-38-39 out of 39 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Support Vector Machines The University of Texas at Dallas What We have Learned So Far 1 2 3 4 5 6 7 Decision Trees Na ve Bayes Linear Regression Logistic Regression Perceptron Neural networks K Nearest Neighbors Which of the above are linear and which are not 1 6 and 7 are non linear o 2 is linear under certain restrictions Decision Surfaces Decision Tree Linear Functions g x wT x b Nonlinear Functions Neural nets Today Support Vector Machine SVM A classifier derived from statistical learning theory by Vapnik et al in 1992 SVM became famous when using images as input it gave accuracy comparable to neural network with hand designed features in a handwriting recognition task Currently SVM is widely used in object detection recognition content based image retrieval text recognition biometrics speech recognition etc Also used for regression will not cover today Chapter 5 1 5 2 5 3 5 11 5 4 in Bishop SVM tutorial start reading from Section 3 V Vapnik Outline Linear Discriminant Function Large Margin Linear Classifier Nonlinear SVM The Kernel Trick Demo of SVM Linear Discriminant Function or a Linear Classifier denotes 1 Given data and two classes learn a function of the form x2 wT x b 0 denotes 1 g x wT x b A hyper plane in the feature space Decide class 1 if g x 0 and class 1 otherwise wT x b 0 x1 Linear Discriminant Function denotes 1 How would you classify these points using a linear discriminant function in order to minimize the error rate denotes 1 x2 Infinite number of answers x1 Linear Discriminant Function denotes 1 How would you classify these points using a linear discriminant function in order to minimize the error rate denotes 1 x2 Infinite number of answers x1 Linear Discriminant Function denotes 1 How would you classify these points using a linear discriminant function in order to minimize the error rate denotes 1 x2 Infinite number of answers x1 Linear Discriminant Function denotes 1 How would you classify these points using a linear discriminant function in order to minimize the error rate Infinite number of answers Which one is the best denotes 1 x2 x1 Large Margin Linear Classifier denotes 1 The linear discriminant function classifier with the maximum margin is the best Margin is defined as the width that the boundary could be increased by before hitting a data point Why it is the best The larger the margin the better generalization Robust to outliers denotes 1 x2 safe zone Margin x1 Large Margin Linear Classifier denotes 1 Aim Learn a large margin classifier Given a set of data points define denotes 1 x2 safe zone Margin For yi 1 wT xi b 1 For yi 1 wT xi b 1 Give an algebraic expression for the width of the margin x1 Algebraic Expression for Width of a Margin safe zone Margin x1 Large Margin Linear Classifier denotes 1 denotes 1 Aim Learn a large margin x2 classifier Mathematical Formulation 2 w maximize Margin x x such that For yi 1 w xi b 1 T x For yi 1 wT xi b 1 Common theme in machine learning LEARNING IS OPTIMIZATION x1 Large Margin Linear Classifier denotes 1 denotes 1 Formulation 1 minimize w 2 x2 Margin x 2 such that For yi 1 wT xi b 1 x x For yi 1 wT xi b 1 x1 Large Margin Linear Classifier denotes 1 denotes 1 Formulation 1 minimize w 2 such that yi wT xi b 1 x2 Margin x 2 x x x1 Large Margin Linear Classifier Formulation 1 2 minimize w 2 such that yi wT xi b 1 This is a Quadratic programming problem with linear constraints o Off the shelf Software However we will convert it to Lagrangian dual in order to use the kernel trick Review Last Lecture The Lagrangian method is a general method for converting constrained optimization problems to unconstrained ones o unconstrained optimization problems are easier When we have equality constraints the transformation looks as follows Review Last Lecture Lagrangian function Same as solving the primal problem Let the optimal solution be Not the same as solving the following dual problem Let the optimal solution be Solving the Optimization Problem Quadratic programming with linear constraints 1 minimize w 2 s t 2 yi wT xi b 1 Lagrangian Function n 1 2 minimize Lp w b i w i yi wT xi b 1 2 i 1 s t i 0 Solving the Optimization Problem n 1 2 minimize Lp w b i w i yi wT xi b 1 2 i 1 s t Lp w Lp b i 0 n 0 w i yi xi i 1 n 0 y i 1 i i 0 Solving the Optimization Problem n 1 2 minimize Lp w b i w i yi wT xi b 1 2 i 1 s t i 0 Lagrangian Dual Problem n 1 n n maximize i i j yi y j xTi x j 2 i 1 j 1 i 1 s t i 0 and n y i 1 i i 0 Solving the Optimization Problem From the equations we can prove that KKT conditions x2 i yi w xi b 1 0 x T Thus only support vectors have i 0 x x The solution has the form n w i yi xi i 1 Support Vectors y x i SV i i i get b from yi wT xi b 1 0 where xi is support vector x1 Solving the Optimization Problem The linear discriminant function is g x w T x b T x i i x b i SV Notice it relies on a dot product between the test point x and the support vectors xi Also keep in mind that solving the optimization problem involved computing the dot products xiTxj between all pairs of training points Large Margin Linear Classifier denotes 1 What if data is not linear separable noisy data outliers etc Slack variables i can be added to allow misclassification of difficult or noisy data points denotes 1 x2 1 2 x1 Large Margin Linear Classifier Formulation n 1 2 minimize w C i 2 i 1 such that 1 minimize w 2 s t 2 yi wT xi b 1 yi wT xi b 1 i i 0 Without slack variables Parameter C can be viewed as a way to control over fitting Large Margin Linear Classifier Formulation Lagrangian Dual Problem n 1 n n maximize i i j yi y j xTi x j 2 i 1 j 1 i 1 such that 0 i C n y i 1 i i 0 Non linear SVMs Datasets that are linearly separable with noise work out great 0 But what are we going to do if the dataset is just too hard 0 x x Kernel Trick SVM Linear SVM Kernel Trick This slide is courtesy of www iro umontreal ca pift6080 documents papers svm tutorial ppt Kernel Trick Motivation Linear classifiers are well understood widely used and efficient How to use linear classifiers to build non linear ones Neural networks Construct …


View Full Document

UT Dallas CS 6375 - 09 - svm-new

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download 09 - svm-new
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 09 - svm-new and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 09 - svm-new and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?