**Unformatted text preview:**

Lagrange Multipliers the Kernel Trick Nicholas Ruozzi University of Texas at Dallas The Strategy So Far Choose hypothesis space Construct loss function ideally convex Minimize loss to learn correct parameters 2 General Optimization A mathematical detour we ll come back to SVMs soon subject to min 0 0 0 1 1 3 General Optimization subject to min 0 0 is not necessarily convex 0 0 1 1 4 General Optimization subject to min 0 Constraints do not need to be linear 0 0 1 1 5 Example subject to min 3 1 log 1 2 log 2 1 2 1 1 0 2 0 6 Example subject to min 3 1 log 1 2 log 2 1 1 2 0 1 0 2 0 7 Lagrangian 0 1 1 Incorporate constraints into a new objective function and are vectors of Lagrange multipliers The Lagrange multipliers can be thought of as enforcing soft 0 constraints 8 Example subject to min 3 1 log 1 2 log 2 1 2 1 1 2 1 log 1 2 log 2 1 1 1 2 1 1 2 2 1 1 2 0 1 0 2 0 9 Duality Construct a dual function by minimizing the Lagrangian over the primal variables inf whenever the Lagrangian is not bounded from below for a fixed and 10 Example subject to min 3 1 log 1 2 log 2 1 1 2 0 1 0 2 0 1 2 1 1 2 1 log 1 2 log 2 1 1 1 2 1 1 2 2 1 log 1 1 1 1 0 2 log 2 1 1 2 0 11 1 exp 1 1 1 2 exp 1 2 1 Example subject to min 3 1 log 1 2 log 2 1 1 2 0 1 0 2 0 1 log 1 2 log 2 1 1 1 2 1 1 2 2 1 2 1 1 2 1 1 2 exp 1 1 1 1 1 1 exp 1 2 1 1 2 1 1 1 exp 1 1 1 exp 1 2 1 1 exp 1 1 1 2 exp 1 2 1 12 Example subject to min 3 1 log 1 2 log 2 1 2 1 1 2 1 log 1 2 log 2 1 1 1 2 1 1 2 2 1 1 2 exp 1 1 1 exp 1 2 1 1 1 1 2 0 1 0 2 0 13 The Primal Problem subject to min 0 Equivalently 0 0 1 1 inf sup 0 Why are these equivalent 14 The Primal Problem subject to min 0 Equivalently 0 0 1 1 inf sup 0 sup 0 0 1 whenever 1 violates the constraints 15 The Dual Problem Equivalently sup 0 sup 0 inf The dual problem is always concave even if the primal problem is not convex For each is a linear function in and Maximum or supremum of concave functions is concave 16 Primal vs Dual sup 0 inf inf sup 0 Why for all for any feasible 0 is feasible if it satisfies all of the constraints 0 Let be the optimal solution to the primal problem and 0 0 17 Example subject to min 3 1 log 1 2 log 2 1 1 2 0 1 0 2 0 1 2 1 1 2 1 log 1 2 log 2 1 1 1 2 1 1 2 2 1 1 2 1 exp 1 1 1 exp 1 2 1 1 0 exp 1 1 1 exp 1 2 1 1 and so the optimum is achieved at the boundary is a decreasing function of 18 1 2 1 2 0 Example subject to min 3 1 log 1 2 log 2 1 1 2 0 1 0 2 0 1 2 1 1 2 1 log 1 2 log 2 1 1 1 2 1 1 2 2 1 1 2 1 exp 1 1 1 exp 1 2 1 1 0 exp 1 1 1 exp 1 2 1 1 exp 1 1 exp 1 1 1 0 exp 1 1 5 19 1 log 5 1 More Examples Minimize subject to 2 2 Given a point 1 and a hyperplane projection of the point onto the hyperplane find the 0 20 Duality equivalent Under certain conditions the two optimization problems are This is called strong duality sup 0 inf inf sup 0 If the inequality is strict then we say that there is a duality gap Size of gap measured by the difference between the two sides of the inequality 21 Slater s Condition For any optimization problem of the form subject to min 0 0 1 are convex functions strong duality holds if there where exists an such that 0 0 1 22 Dual SVM such that separable Note that Slater s condition holds as long as the data is linearly 1 for all 2 min 1 2 23 Dual SVM 1 Convex in so take derivatives to form the dual 1 2 0 0 24 Dual SVM 1 Convex in so take derivatives to form the dual 1 2 0 25 Dual SVM max 0 1 2 such that By strong duality solving this problem is equivalent to solving 0 the primal problem Given the optimal we can easily construct can be found by complementary slackness 26 Complementary Slackness Suppose that there is zero duality gap Let be an optimum of the primal and be an optimum of the dual 0 inf 0 1 1 1 1 1 27 0 0 0 Complementary Slackness This means that this can only happen if 0 i e the constraint is not tight then 1 0 As and for all Put another way 0 0 If If 0 0 then 28 ONLY applies when there is no duality gap 0 0 Dual SVM max 0 1 2 such that By complementary slackness support vector can then solve for 0 means that using is a 0 29 Dual SVM max 0 1 2 such that Takes time just to evaluate the objective function 2 Active area of research to try to speed this up 0 30 Dual SVM max 0 1 2 such that The dual formulation only depends on inner products between 0 the data points Same thing is true if we use feature vectors instead 31 Dual SVM max 0 1 2 such that The dual formulation only depends on inner products between 0 the data points Same thing is true if we use feature vectors instead 32 The Kernel Trick For some feature vectors we can compute the inner products quickly even if the feature vectors are very large This is best illustrated by example Let 1 2 1 2 2 1 2 1 2 2 1 2 2 1 2 1 2 1 2 2 1 2 1 2 2 2 2 2 1 1 2 2 2 33 The Kernel Trick For some feature vectors we can compute the inner products quickly even if the feature vectors are very large This is best illustrated by example Let 1 2 1 2 2 1 2 1 2 2 1 2 …

View Full Document