DOC PREVIEW
UT Dallas CS 6375 - 16.svm-2

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11Machine LearningCS6375 --- Spring 2015aSupport Vector Machines (II)2The Dual Problem∑∑ ∑∑== ===≥=−=RiiiijijiijijjRiRjiRiiyxxyyGGaaaL11 110,0:subject to'21 MaximizeααSVfor 0≠=∑iiiiixywαα23But what if the data is not linearly separable?Idea 1: Find minimum w.w, whileminimizing number oftraining set errors.Problematic: Two thingsto minimize makes foran ill-defined optimization4Idea 1.1:Minimizew.w + C (#train errors)There’s a practical problem with this approachIt doesn’t distinguish between disastrous errors and near misses.But what if the data is not linearly separable?35Idea 1.2:Minimizew.w + C (distance of errorpoints to theircorrect place)But what if the data is not linearly separable?6Learning the Maximum Margin ClassifierGiven guess of w , b we can• Compute sum of distances of points to their correct zones• Compute the margin widthAssume R data points, each(xk,yk) where yk= +/- 1What should our quadraticoptimization criterion be?How many constraints will wehave?What should they be?47Learning the Soft Margin ClassifierGiven guess of w , b we can• Compute sum of distances of points to their correct zones• Compute the margin widthAssume R datapoints, each(xk,yk) where yk= +/- 1What should our quadraticoptimization criterion be?MinimizeHow many constraints will wehave? 2RWhat should they be?w . xk+ b >= 1-εkif yk= 1w . xk+ b <= -1+εkif yk= -1εk>= 0 for all k8Learning the Soft Margin ClassifierGiven guess of w , b we can• Compute sum of distances of points to their correct zones• Compute the margin widthAssume R datapoints, each(xk,yk) where yk= +/- 1What should our quadraticoptimization criterion be?MinimizeHow many constraints will wehave? 2RWhat should they be?w . xk+ b >= 1-εkif yk= 1w . xk+ b <= -1+εkif yk= -1εk>= 0 for all kOur original (noiseless data) QP had m+1 variables: w1, w2, … wm, and b.Our new (noisy data) QP has m+1+Rvariables: w1, w2, …, wm, b, ε1, …, εR59Controlling Soft Margin SeparationC controls trade-off between margin and error.10The Dual QPData points with αk> 0 will be the support vectors, so this sum only needs to be over the support vectors.611Soft Margin Dual ProblemOne factor αifor each training example0 < αi< C  SV with ξi = 0αi= C  SV with ξi > 0αi= 0 otherwise12SVM Training and TestingRemember: dot product of instances in both training and testingTraining:Testing: jijiijijjRiRjiRiixxyyGGaaaL'211 11=−=∑ ∑∑= ==bxxybxwSVxiiii+•=+•∑∈**α713Example: suppose we are in 1-dimensionWhat would SVMs do with this data?Not a big surprise.14Harder 1-dimensional DatasetWhat can be done about this?Idea 1: use soft margin, allow some errors815Harder 1-dimensional DatasetLet’s project the data points to a 2D space using non-linearbasis functions.16Harder 1-dimensional DatasetLet’s project the data points to a 2D space using non-linearbasis functions.Now the data is linearly separable.917Learning in the Feature SpaceMap data into a feature space where they are linearly separable x->Φ(x) 18Non-Linear SVMs: Feature SpacesGeneral idea: the original feature space can be mapped to other feature space (likely higher dimensional) where the training set is separable:Φ1019QP with Basis Functionskkkktskkkkxwxybxywk SVfor )()1()(0..•Φ−−=Φ=∑>εαα))((),,( bxwsignbwxf+Φ•=Classify with: This is sensible. Is that the end of the story? No…there’s one more trick!20Quadratic Basis FunctionsNumber of terms (assuming m inputdimensions) = m2/2What are those √2 ’s doing?•You should be happy they do no harm.•You’ll find out why they’re there soon.1121QP with Basis Functionskkkktskkkkxwxybxywk SVfor )()1()(0..•Φ−−=Φ=∑>εαα))((),,( bxwsignbwxf+Φ•=Classify with: We must do R2/2 dot products toget this matrix ready.Each dot product requires m2/2additions and multiplications.The whole thing costs R2m2/4.This is computationally expensive!22Quadratic Dot Products1223Quadratic Dot ProductsLet’s look at another function of a and b:They’re the same!What’s the complexity of each? m vs. m224QP with Basis Functionskkkktskkkkxwxybxywk SVfor )()1()(0..•Φ−−=Φ=∑>εαα))((),,( bxwsignbwxf+Φ•=Classify with: We must do R2/2 dot products toget this matrix ready.Each dot product now only requiresm additions and multiplications(The Kernel Trick))()()(lklkxxKxx•=Φ•ΦCalled kernel/gram matrix1325Higher Order Polynomials26QP with Basis FunctionsWhat about testing? Expensive in the new feature space?))((),,( bxwsignbwxf+Φ•=Example: 5-degree polynomial1427Overfitting? Curse of dimensionality?• Mapping to high dimensional space x -> Ф(x)• Are we overfitting the training data? • The use of Maximum Margin magically makes this not a problem• No matter what the basis function is, there are only up to R parameters: α1, α2, …, αR, and usually most are set to zero by the Maximum Margin.28SVM Kernel Functions• K(a,b)=(a . b +1)dis an example of an SVM Kernel Function• Beyond polynomials there are other very high dimensional basis functions that can be made practical by finding the right Kernel Function• Radial-Basis-style Kernel Function:• Neural-net-style Kernel Function:• String kernel, tree kernel, ….1529KernelsA function that returns the value of the dot product between the images of the two argumentsGiven a function K, it is possible to verify that it is a kernelCan construct kernels from simple ones30SVM Performance• Anecdotally they work very very well indeed.• A lot of empirical and theoretical studies.• Can use ensemble ideas to do multi-class classification1631What you should know• Linear SVMs• The definition of a maximum margin classifier• What QP can do for you (for this class, you don’t need to know how it does it)• How Maximum Margin can be turned into a QP problem• How we deal with noisy (non-separable) data• How we permit non-linear boundaries• How SVM Kernel functions permit us to pretend we’re working with ultra-high-dimensional basis function


View Full Document

UT Dallas CS 6375 - 16.svm-2

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download 16.svm-2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 16.svm-2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 16.svm-2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?