DOC PREVIEW
UT Dallas CS 6375 - svm-2

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Learning CS6375 Spring 2015 Support Vector Machines II a 1 The Dual Problem Maximize L R i 1 1 ai 2 R R aa i 1 j 1 i j G ij G ij y i y j x i x j subject to i 0 w i yi xi R i 1 i yi 0 i 0 for SV i 2 1 But what if the data is not linearly separable Idea 1 Find minimum w w while minimizing number of training set errors Problematic Two things to minimize makes for an ill defined optimization 3 But what if the data is not linearly separable Idea 1 1 Minimize w w C train errors There s a practical problem with this approach It doesn t distinguish between disastrous errors and near misses 4 2 But what if the data is not linearly separable Idea 1 2 Minimize w w C distance of error points to their correct place 5 Learning the Maximum Margin Classifier Given guess of w b we can Compute sum of distances of points to their correct zones Compute the margin width Assume R data points each xk yk where yk 1 What should our quadratic optimization criterion be How many constraints will we have What should they be 6 3 Learning the Soft Margin Classifier Given guess of w b we can Compute sum of distances of points to their correct zones Compute the margin width Assume R datapoints each xk yk where yk 1 What should our quadratic optimization criterion be Minimize How many constraints will we have 2R What should they be w xk b 1 k if yk 1 w xk b 1 k if yk 1 k 0 for all k 7 Learning the Soft Margin Classifier Given guess of w b we can Compute sum of distances of points to their correct zones Compute the margin width Assume R datapoints each Our original noiseless data yQP had1 xk yk where k m 1 variables w1 w2 wm and b Our newHow noisy data QP has m 1 R many constraints will we What should our quadratic variables w1 w2R have 2 wm b 1 R optimization criterion be What should they be Minimize w xk b 1 k if yk 1 w xk b 1 k if yk 1 k 0 for all k 8 4 Controlling Soft Margin Separation C controls trade off between margin and error 9 The Dual QP Data points with k 0 will be the support vectors so this sum only needs to be 10 over the support vectors 5 Soft Margin Dual Problem One factor i for each training example 0 i C SV with i 0 i C SV with i 0 i 0 otherwise 11 SVM Training and Testing Remember dot product of instances in both training and testing Training L R i 1 ai 1 2 R R aa i 1 j 1 i j G ij G ij y i y j x i x j Testing w x b y x x xi SV i i i b 12 6 Example suppose we are in 1 dimension What would SVMs do with this data Not a big surprise 13 Harder 1 dimensional Dataset What can be done about this Idea 1 use soft margin allow some errors 14 7 Harder 1 dimensional Dataset Let s project the data points to a 2D space using non linear basis functions 15 Harder 1 dimensional Dataset Let s project the data points to a 2D space using non linear basis functions Now the data is linearly separable 16 8 Learning in the Feature Space Map data into a feature space where they are linearly separable x x 17 Non Linear SVMs Feature Spaces General idea the original feature space can be mapped to other feature space likely higher dimensional where the training set is separable 18 9 QP with Basis Functions w k s t k 0 k y k xk b yk 1 k xk w Classify with f x w b sign w x b for SV xk This is sensible Is that the end of the story No there s one more trick 19 Quadratic Basis Functions Number of terms assuming m input dimensions m2 2 What are those 2 s doing You should be happy they do no harm You ll find out why they re there soon 20 10 QP with Basis Functions We must do R2 2 dot products to get this matrix ready w k s t k 0 k y k xk Each dotClassify product with requires m2 2 additions and multiplications f xthing w bcosts sign w x b 2 4 b yk 1 k xk w The whole R2 m for SV xk This is computationally expensive 21 Quadratic Dot Products 22 11 Quadratic Dot Products Let s look at another function of a and b They re the same What s the complexity of each m vs m2 23 QP with Basis Functions We must do R2 2 dot products to get this matrix ready w y k xk Each dotClassify product with now only requires m additions and multiplications The Kernel f x Trick w b sign w x b b yk 1 k xk w k s t k 0 for SV xk k Called kernel gram matrix xk xl K xk xl 24 12 Higher Order Polynomials 25 QP with Basis Functions What about testing Expensive in the new feature space f x w b sign w x b Example 5 degree polynomial 26 13 Overfitting Curse of dimensionality Mapping to high dimensional space x x Are we overfitting the training data The use of Maximum Margin magically makes this not a problem No matter what the basis function is there are only up to R parameters 1 2 R and usually most are set to zero by the Maximum Margin 27 SVM Kernel Functions K a b a b 1 d is an example of an SVM Kernel Function Beyond polynomials there are other very high dimensional basis functions that can be made practical by finding the right Kernel Function Radial Basis style Kernel Function Neural net style Kernel Function String kernel tree kernel 28 14 Kernels A function that returns the value of the dot product between the images of the two arguments Given a function K it is possible to verify that it is a kernel Can construct kernels from simple ones 29 SVM Performance Anecdotally they work very very well indeed A lot of empirical and theoretical studies Can use ensemble ideas to do multi class classification 30 15 What you should know Linear SVMs The definition of a maximum margin classifier What QP can do for you for this class you don t need to know how it does it How Maximum Margin can be turned into a QP problem How we deal with noisy non separable data How we permit non linear boundaries How SVM Kernel functions permit us to pretend we re working with ultra high dimensional basis function terms 31 16


View Full Document

UT Dallas CS 6375 - svm-2

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download svm-2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view svm-2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view svm-2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?