Unformatted text preview:

SVMs with Slack Nicholas Ruozzi University of Texas at Dallas Based roughly on the slides of David Sontag Dual SVM max 0 1 2 such that The dual formulation only depends on inner products between 0 the data points Same thing is true if we use feature vectors instead 2 Dual SVM max 0 1 2 such that The dual formulation only depends on inner products between 0 the data points Same thing is true if we use feature vectors instead 3 The Kernel Trick More generally a kernel is a function for some feature map Rewrite the dual objective 0 0 max 1 2 4 Kernels Bigger feature space increases the possibility of overfitting Large margin solutions may still generalize reasonably well Alternative add penalties to the objective to disincentivize complicated solutions Not a quadratic program anymore in fact it s NP hard min 2 1 2 Similar problem to counting the number of misclassifications no notion of how badly the data is misclassified 5 SVMs with Slack Allow misclassification Penalize misclassification linearly just like in the perceptron algorithm Again easier to work with than counting misclassifications Objective stays convex Will let us handle data that isn t linearly separable 6 SVMs with Slack such that min 1 2 2 1 for all 0 for all 7 SVMs with Slack such that min 1 2 2 1 for all 0 for all Potentially allows some points to be misclassified inside the margin 8 SVMs with Slack such that min 1 2 2 Constant c determines degree to which slack is penalized 1 for all 0 for all 9 SVMs with Slack such that min 1 2 2 1 for all 0 for all How does this objective change with 10 SVMs with Slack such that min 1 2 2 1 for all 0 for all How does this objective change with As requires a perfect classifier As allows arbitrary classifiers i e ignores the data 0 11 SVMs with Slack such that min 1 2 2 1 for all How should we pick 0 for all 12 SVMs with Slack such that min 1 2 2 1 for all How should we pick 0 for all Divide the data into three pieces training testing and validation Use the validation set to tune the value of the hyperparameter 13 Evaluation Methodology General learning strategy Build a classifier using the training data Select hyperparameters using validation data Evaluate the chosen model with the selected hyperparameters on the test data How can we tell if we overfit the training data 14 ML in Practice Gather Data Labels Select feature vectors Randomly split into three groups Training set Validation set Test set Experimentation cycle Select a good hypothesis from the hypothesis space Tune hyperparameters using validation set Compute accuracy on test set fraction of correctly classified instances 15 SVMs with Slack such that min 1 2 2 1 for all What is the optimal value of 0 for all for fixed and 16 SVMs with Slack such that min 1 2 2 1 for all What is the optimal value of 0 for all for fixed and If then If 1 then 0 1 1 17 SVMs with Slack such that min 1 2 2 1 for all We can formulate this slightly differently 0 for all max 0 1 Does this look familiar 18 Hinge Loss Formulation Obtain a new objective by substituting in for min max 0 1 2 1 2 Penalty to prevent overfitting Hinge loss 19 Hinge Loss Formulation Obtain a new objective by substituting in for min max 0 1 2 1 2 Can minimize with gradient descent 20 Imbalanced Data If the data is imbalanced i e more positive examples than negative examples may want to evenly distribute the error between the two classes such that min 1 2 2 1 1 1 for all 0 for all 21 Dual of Slack Formulation such that min 1 2 2 1 for all 0 for all 22 Dual of Slack Formulation Convex in so take derivatives to form the dual 1 1 2 0 0 0 23 Dual of Slack Formulation max 0 1 2 such that 0 0 for all 24 Generalization We argued intuitively that SVMs generalize better than the perceptron algorithm How can we make this precise Coming soon but first 25 Other simple hypothesis spaces for supervised learning Roadmap Where are we headed nearest neighbor Decision trees Learning theory Generalization and PAC bounds VC dimension Bias variance tradeoff 26


View Full Document

UT Dallas CS 6375 - SVMs with Slack

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download SVMs with Slack
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view SVMs with Slack and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view SVMs with Slack 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?