UT Dallas CS 6375 - SVM-Optimization-review - D3096005

Home> Schools> University of Texas at Dallas> Computer Science (CS) > CS 6375> SVM-Optimization-review

DOC PREVIEW

UT Dallas CS 6375 - SVM-Optimization-review

School name University of Texas at Dallas

Course Cs 6375- Machine Learning

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Quadratic OptimizationThis handout covers some results on constrained nonlinear optimization. Suppose x is a variables vector ofn dimensions: x = (x1, . . . , xn). We consider simple polynomial function of x that involve at most second-order terms. Matrix and vector notation considerably simplify the expressions. Here is a reminder of severalformulas, in terms of the vectors x, y and the matrices A, B:The inner product of x and y x0y = x1y1+ . . . + xnynThe squared norm of x |x|2= x0x = x21+ . . . + x2nThe most general quadratic form x0Ax + b0x + cRemainder x0Ax = x0(Ax) =PijaijxixjPositive semidefinite matrix B x0Bx ≥ 0 whenever x 6= 0Gradients (derivatives) f(x) = x0Bx + b0x + c, B pos semidef, then ∇f(x) = 2Bx + bUnconstrained optimizationHere we would like to compute the minimum of f(x). It can be shown that a quadratic f(x) = x0Bx +b0x+chas a minimum if and only if B is positive semidefinite. It has a unique minimum if and only if B is positivedefinite. In both cases the minimum is values of x that zero out the gradient. That is, the solutions of:Bx = −b2Example 1Find the minimum off(x1, x2) = x21+ 2x22+ 2x1x2+ 2x1− 6x2+ 4With x = (x1, x2) we have:f(x) = x01 11 2x + (2, −6)x + 4Therefore, if there is a minimum it is the solution of:1 11 2x = −(2, −6)/2 = (−1, 3)The solution is (−5, 4) so that if a minimum exists it is at x1= −5, x2= 4. To argue that it is indeed aminimum we still need to show that the matrix is positive definite. One way of showing that a matrix B ispositive semidefinite is to show that there is a matrix G such that:B = G0GIn our case with B =1 11 2we can take G =1 10 1.1Equality linear constraintsHere we would like to compute the minimum of f(x) where x is also subject to linear equality constraints.A linear equality constraint can be written as c0x = d where c is a vector and d is a scalar. When there arek constraints we write them as c0ix = di.There are several methods to reduce this to the case of unconstrained optimization. The most basic oneis the method of Lagrange multipliers. In order to minimize f(x) under the above equality constraints wedefine a Lagrangian multiplier αifor the ith constraint and form the Lagrangian function L(x, α1, . . . , αk)as follows:L(x, α1, . . . , αk) = f(x) +kXi=1αi(c0ix − di)We can now treat L as a function of n + k variables and use the method of unconstrained minimization tofind the unconstrained minimum of L. (Observe that L is quadratic.)Example 2Minimize f(x1, x2, x3) = x21+ x22+ x23under the two constraints: x2− x1= 1, x3= 1.This is an easy case to solve without the Lagrange technique. It is easy to see by direct substitution thatf = x21+ (x1+ 1)2+ 1 and by taking derivatives and equating to 0 the minimum is at x1= −1/2. Using theLagrange technique:L(x1, x2, α1, α2) = x21+ x22+ x23+ α1(x2− x1− 1) + α2(x3− 1)The derivative with respect to x1gives: 2x1− α1= 0The derivative with respect to x2gives: 2x2+ α1= 0The derivative with respect to x3gives: 2x3+ α2= 0The derivative with respect to α1gives: x2− x1− 1 = 0The derivative with respect to α2gives: x3− 1 = 0Treating this as a system of 5 equations with 5 unknowns we get the following solution:x1= −1/2, x2= 1/2, , x3= 1, α1= −1, α2= −2The dual problemThere is another way of using the Lagrangian to solve the constrained optimization problem. The followingresult can be proved:minx subject to the equality constraintsf(x) = maxα1,...,αkminxL(x, α1, . . . , αk)To use this, we begin by treating the Lagrangian multipliers as constants, minimizing the Lagrangian withrespect to x. This is done by taking the derivatives only with respect to x. The resulting system of linearequations can be used to solve for the variables x in terms of the Lagrangian multipliers. This is thensubstituted back into the Lagrangian, transforming it into a function of only the Lagrangian multipliers.This function is called the dual problem, and it can then be solved using unconstrained techniques.Example 3To convert the problem of Example 2 into a dual form we only need to take the derivatives of the Lagrangianwith respect to x1, x2, x3As above:The derivative with respect to x1gives: 2x1− α1= 0The derivative with respect to x2gives: 2x2+ α1= 0The derivative with respect to x3gives: 2x3+ α2= 02Solving for x1, x2in terms of α1, α2we have:x1= α1/2, x2= −α1/2, x3= −α2/2 (1)Substituting back into the Lagrangian gives:L(α1, α2) = −12α21−14α22− α1− α2The dual problem requires that we maximize this expression The maximizing α1, α2are: α1= −1, α2= −2,and the minimizing x1, x2, x3of the original (primal) problem are computed from (1).Example 4Consider the n dimensional hyperplane (generalization of a line in 2D) w0x = s. In 2D it is the linew1x1+ w2x2= s. Compute the distance of this hyperplane from the origin.A point x on the hyperplane has distance of |x| from the origin. We are looking for the point x that minimizes|x|2subject to the constraint w0x = s. The Lagrangian:L(x, α) = |x|2+ α(w0x − s)Computing the derivative with respect to x and equating to 0 we have:2x + αw = 0 ⇒ x = −α2wSubstituting into the Lagrangian we get the dual problem:L(α) =α24|w|2+ α(−α2w0w − s) = −α24|w|2− αsIn terms of α this is a parabola with maximum at α = −2s/|w|2. This gives: x = sw/|w|2so that thedistance of the hyperplane from the origin is |s|/|w|.3Linear inequality constraintsHere we would like to compute the minimum of f(x) where x is subject to linear inequality constraints. Alinear inequality constraint can be written as c0x ≤ d where c is a vector and d is a scalar. When there arek constraints we write them as c0ix ≤ di.The Lagrangian in this case is the same as in the previous case:L(x, α1, . . . , αk) = f(x) +kXi=1αi(c0ix − di)And the following dual theorem holds:minx subject to the inequality constraintsf(x) = maxα1,...,αksubject to αj≥0minxL(x, α1, . . . , αk)Formally, the problemminx subject to the inequality constraintsf(x)is called the primal problem andmaxα1,...,αksubject to αj≥0minxL(x, α1, . . . , αk)is called the dual problem. Observe that here the dual problem is no longer trivial.The following important result can be proved for the solution to the dual problem:The Karush-Kuhn-Tucker complementarity condition:αi(c0ix − di) = 0 for i = 1, . . . , kThis means that either the constraint holds as an equality constraint, or the

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

UT Dallas CS 6375 - SVM-Optimization-review

Sign up for free to view:

Please select your school