UT Dallas CS 6375 - Lagrange Multipliers & the Kernel Trick

Unformatted text preview:

Lagrange Multipliers the Kernel Trick Nicholas Ruozzi University of Texas at Dallas The Strategy So Far Choose hypothesis space Construct loss function ideally convex Minimize loss to learn correct parameters 2 General Optimization A mathematical detour we ll come back to SVMs soon subject to min 0 0 0 1 1 3 General Optimization subject to min 0 0 is not necessarily convex 0 0 1 1 4 General Optimization subject to min 0 Constraints do not need to be linear 0 0 1 1 5 Example subject to min 3 1 log 1 2 log 2 1 2 1 1 0 2 0 6 Example subject to min 3 1 log 1 2 log 2 1 1 2 0 1 0 2 0 7 Lagrangian 0 1 1 Incorporate constraints into a new objective function and are vectors of Lagrange multipliers The Lagrange multipliers can be thought of as enforcing soft 0 constraints 8 Example subject to min 3 1 log 1 2 log 2 1 2 1 1 2 1 log 1 2 log 2 1 1 1 2 1 1 2 2 1 1 2 0 1 0 2 0 9 Duality Construct a dual function by minimizing the Lagrangian over the primal variables inf whenever the Lagrangian is not bounded from below for a fixed and 10 Example subject to min 3 1 log 1 2 log 2 1 1 2 0 1 0 2 0 1 2 1 1 2 1 log 1 2 log 2 1 1 1 2 1 1 2 2 1 log 1 1 1 1 0 2 log 2 1 1 2 0 11 1 exp 1 1 1 2 exp 1 2 1 Example subject to min 3 1 log 1 2 log 2 1 1 2 0 1 0 2 0 1 log 1 2 log 2 1 1 1 2 1 1 2 2 1 2 1 1 2 1 1 2 exp 1 1 1 1 1 1 exp 1 2 1 1 2 1 1 1 exp 1 1 1 exp 1 2 1 1 exp 1 1 1 2 exp 1 2 1 12 Example subject to min 3 1 log 1 2 log 2 1 2 1 1 2 1 log 1 2 log 2 1 1 1 2 1 1 2 2 1 1 2 exp 1 1 1 exp 1 2 1 1 1 1 2 0 1 0 2 0 13 The Primal Problem subject to min 0 Equivalently 0 0 1 1 inf sup 0 Why are these equivalent 14 The Primal Problem subject to min 0 Equivalently 0 0 1 1 inf sup 0 sup 0 0 1 whenever 1 violates the constraints 15 The Dual Problem Equivalently sup 0 sup 0 inf The dual problem is always concave even if the primal problem is not convex For each is a linear function in and Maximum or supremum of concave functions is concave 16 Primal vs Dual sup 0 inf inf sup 0 Why for all for any feasible 0 is feasible if it satisfies all of the constraints 0 Let be the optimal solution to the primal problem and 0 0 17 Example subject to min 3 1 log 1 2 log 2 1 1 2 0 1 0 2 0 1 2 1 1 2 1 log 1 2 log 2 1 1 1 2 1 1 2 2 1 1 2 1 exp 1 1 1 exp 1 2 1 1 0 exp 1 1 1 exp 1 2 1 1 and so the optimum is achieved at the boundary is a decreasing function of 18 1 2 1 2 0 Example subject to min 3 1 log 1 2 log 2 1 1 2 0 1 0 2 0 1 2 1 1 2 1 log 1 2 log 2 1 1 1 2 1 1 2 2 1 1 2 1 exp 1 1 1 exp 1 2 1 1 0 exp 1 1 1 exp 1 2 1 1 exp 1 1 exp 1 1 1 0 exp 1 1 5 19 1 log 5 1 More Examples Minimize subject to 2 2 Given a point 1 and a hyperplane projection of the point onto the hyperplane find the 0 20 Duality equivalent Under certain conditions the two optimization problems are This is called strong duality sup 0 inf inf sup 0 If the inequality is strict then we say that there is a duality gap Size of gap measured by the difference between the two sides of the inequality 21 Slater s Condition For any optimization problem of the form subject to min 0 0 1 are convex functions strong duality holds if there where exists an such that 0 0 1 22 Dual SVM such that separable Note that Slater s condition holds as long as the data is linearly 1 for all 2 min 1 2 23 Dual SVM 1 Convex in so take derivatives to form the dual 1 2 0 0 24 Dual SVM 1 Convex in so take derivatives to form the dual 1 2 0 25 Dual SVM max 0 1 2 such that By strong duality solving this problem is equivalent to solving 0 the primal problem Given the optimal we can easily construct can be found by complementary slackness 26 Complementary Slackness Suppose that there is zero duality gap Let be an optimum of the primal and be an optimum of the dual 0 inf 0 1 1 1 1 1 27 0 0 0 Complementary Slackness This means that this can only happen if 0 i e the constraint is not tight then 1 0 As and for all Put another way 0 0 If If 0 0 then 28 ONLY applies when there is no duality gap 0 0 Dual SVM max 0 1 2 such that By complementary slackness support vector can then solve for 0 means that using is a 0 29 Dual SVM max 0 1 2 such that Takes time just to evaluate the objective function 2 Active area of research to try to speed this up 0 30 Dual SVM max 0 1 2 such that The dual formulation only depends on inner products between 0 the data points Same thing is true if we use feature vectors instead 31 Dual SVM max 0 1 2 such that The dual formulation only depends on inner products between 0 the data points Same thing is true if we use feature vectors instead 32 The Kernel Trick For some feature vectors we can compute the inner products quickly even if the feature vectors are very large This is best illustrated by example Let 1 2 1 2 2 1 2 1 2 2 1 2 2 1 2 1 2 1 2 2 1 2 1 2 2 2 2 2 1 1 2 2 2 33 The Kernel Trick For some feature vectors we can compute the inner products quickly even if the feature vectors are very large This is best illustrated by example Let 1 2 1 2 2 1 2 1 2 2 1 2 …


View Full Document

UT Dallas CS 6375 - Lagrange Multipliers & the Kernel Trick

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download Lagrange Multipliers & the Kernel Trick
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lagrange Multipliers & the Kernel Trick and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lagrange Multipliers & the Kernel Trick 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?