DOC PREVIEW
UT Dallas CS 6375 - 15.svm-1

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Learning CS6375 Spring 2015 Support Vector Machines 1 Linear Classifiers How would you classify this data 2 1 Linear Classifiers Any of these would be fine but which is the best 3 Linear Classifiers Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a data point 4 2 Linear Classifiers The maximum margin linear classifier is the linear classifier with the maximum margin This is the simplest kind of SVM Called a linear SVM 5 Linear Classifiers The maximum margin linear classifier is the linear classifier with the maximum margin This is the simplest kind of SVM Called a linear SVM 6 3 Why Maximum Margin 1 Intuitively this feels safest 2 If we ve made a small error in the location of the boundary this gives us least chance of causing a misclassification 3 LOOCV is easy since the model is immune to removal of any non support vector data points 4 Related theory using VC dimension showing that this is a good thing 5 Empirically it works very very well 7 Specifying a Line and a Margin How do we represent this mathematically in m input dimensions 8 4 Specifying a Line and a Margin Plus plane x w x b 1 Minus plane x w x b 1 Classify as 1 1 if w x b 1 if w x b 1 9 Computing the Margin Width Plus plane x w x b 1 Minus plane x w x b 1 Claim The vector w is perpendicular to the Plus Plane Why 10 5 Computing the Margin Width Plus plane x w x b 1 Minus plane x w x b 1 The vector w is perpendicular to the Plus Plane Let x be any point on the minus plane Let x be the closest plus plane point to x Claim x x w for some value of Why 11 Computing the Margin Width Plus plane x w x b 1 Minus plane x w x b 1 The vector w is perpendicular to the Plus Plane Let x be any point on the minus plane Let x be the closest plus plane point to x Claim x x w for some value of Why 12 6 Computing the Margin Width What we know w x b 1 w x b 1 x x w x x M It s now easy to get M in terms of w and b w x w b 1 w x b w w 1 1 w w 1 13 Computing the Margin Width What we know w x b 1 w x b 1 x x w x x M 14 7 Computing the Margin Width Given a guess of w and b we can Compute whether all data points in the correct half planes Compute the width of the margin So now we just need to write a program to search the space of w s and b s to find the widest margin or the smallest w w that matches all the data points How 15 Learning the Maximum Margin Classifier What should our quadratic optimization criterion be Minimize w w How many constraints will we have R What should they be w xk b 1 if yk 1 w xk b 1 if yk 1 16 8 Learning via Quadratic Programming QP is a well studied class of optimization algorithms to maximize a quadratic function of some real valued variables subject to linear constraints 17 Quadratic Programming 18 9 Quadratic Programming There exist algorithms for finding such constrained quadratic optima much more efficiently and reliably than gradient ascent But they are very fiddly you probably don t want to write one yourself 19 Quadratic Optimization We ll review unconstrained optimization optimization with linear equality constraints optimization with linear equality inequality constraints 20 10 Unconstrained Optimization Goal compute the minimum of a quadratic function f x Without loss of generality let f x x Ax b x c f has a minimum if and only if A is positive semi definite f has a unique minimum if and only if A is positive definite In both cases the minimum is values of x that zero out the gradient That is the solutions of Ax b 2 21 Example 1 Find the minimum of With x x1 x2 we have Therefore if there is a minimum it is the solution of The solution is 7 2 so that if a minimum exists it is at x1 7 and x2 2 To argue that it is indeed a minimum we still need to show that the matrix is positive semi definite One way of showing that A is positive semi definite is to show that there is a matrix G such that A G G In our case with 1 1 A 1 2 we can take 22 11 Optimization with Linear Equality Constraints Goal compute the minimum of f x where x is also subject to linear equality constraints A linear equality constraint can be written as cx d where c is a vector and d is a scalar When there are k constraints we write them as cix di Trick reduce this to the case of unconstrained optimization using the method of Lagrange multipliers 23 Optimization with Linear Equality Constraints In order to minimize f x under the above equality constraints we define a Lagrangian multiplier i for the ith constraint and form the Lagrangian function L x 1 k as follows We can now treat L as a function of n k variables and use the method of unconstrained minimization to find the unconstrained minimum of L 24 12 Example 2 Minimize under the two constraints Method 1 By direct substitution we see that Then by taking derivatives and equating to 0 we see that the minimum is at x1 1 2 Method 2 Using the Lagrange technique we have The derivative The derivative The derivative The derivative The derivative wrt x1 gives 2x1 1 0 wrt x2 gives 2x2 1 0 wrt x3 gives 2x3 2 0 wrt 1 gives x2 x1 1 0 wrt 2 gives x3 1 0 Treating this as a system of 5 equations with 5 unknowns we get x1 1 2 x2 1 2 x3 1 1 1 2 2 25 The Dual Problem There is another way of using the Lagrangian to solve the constrained optimization problem By treating the Lagrangian multipliers as constants we 1 optimize the Lagrangian with respect to x by taking the derivatives only with respect to x 2 solve the resulting system of linear equations for x in terms of the Lagrangian multipliers 3 substitute back into the Lagrangian and transform it into a function of only the Lagrangian multipliers This dual problem can be solved using unconstrained techniques 26 13 Example 3 To convert Example 2 into a dual form we only need to take the derivatives of the Lagrangian with respect to x1 x2 x3 As above The derivative wrt x1 gives 2x1 1 0 The derivative wrt x2 gives 2x2 1 0 The derivative wrt x3 gives 2x3 2 0 …


View Full Document

UT Dallas CS 6375 - 15.svm-1

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download 15.svm-1
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 15.svm-1 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 15.svm-1 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?