DOC PREVIEW
UT Dallas CS 6375 - SVM-Optimization-review

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Quadratic OptimizationThis handout covers some results on constrained nonlinear optimization. Suppose x is a variables vector ofn dimensions: x = (x1, . . . , xn). We consider simple polynomial function of x that involve at most second-order terms. Matrix and vector notation considerably simplify the expressions. Here is a reminder of severalformulas, in terms of the vectors x, y and the matrices A, B:The inner product of x and y x0y = x1y1+ . . . + xnynThe squared norm of x |x|2= x0x = x21+ . . . + x2nThe most general quadratic form x0Ax + b0x + cRemainder x0Ax = x0(Ax) =PijaijxixjPositive semidefinite matrix B x0Bx ≥ 0 whenever x 6= 0Gradients (derivatives) f(x) = x0Bx + b0x + c, B pos semidef, then ∇f(x) = 2Bx + bUnconstrained optimizationHere we would like to compute the minimum of f(x). It can be shown that a quadratic f(x) = x0Bx +b0x+chas a minimum if and only if B is positive semidefinite. It has a unique minimum if and only if B is positivedefinite. In both cases the minimum is values of x that zero out the gradient. That is, the solutions of:Bx = −b2Example 1Find the minimum off(x1, x2) = x21+ 2x22+ 2x1x2+ 2x1− 6x2+ 4With x = (x1, x2) we have:f(x) = x01 11 2x + (2, −6)x + 4Therefore, if there is a minimum it is the solution of:1 11 2x = −(2, −6)/2 = (−1, 3)The solution is (−5, 4) so that if a minimum exists it is at x1= −5, x2= 4. To argue that it is indeed aminimum we still need to show that the matrix is positive definite. One way of showing that a matrix B ispositive semidefinite is to show that there is a matrix G such that:B = G0GIn our case with B =1 11 2we can take G =1 10 1.1Equality linear constraintsHere we would like to compute the minimum of f(x) where x is also subject to linear equality constraints.A linear equality constraint can be written as c0x = d where c is a vector and d is a scalar. When there arek constraints we write them as c0ix = di.There are several methods to reduce this to the case of unconstrained optimization. The most basic oneis the method of Lagrange multipliers. In order to minimize f(x) under the above equality constraints wedefine a Lagrangian multiplier αifor the ith constraint and form the Lagrangian function L(x, α1, . . . , αk)as follows:L(x, α1, . . . , αk) = f(x) +kXi=1αi(c0ix − di)We can now treat L as a function of n + k variables and use the method of unconstrained minimization tofind the unconstrained minimum of L. (Observe that L is quadratic.)Example 2Minimize f(x1, x2, x3) = x21+ x22+ x23under the two constraints: x2− x1= 1, x3= 1.This is an easy case to solve without the Lagrange technique. It is easy to see by direct substitution thatf = x21+ (x1+ 1)2+ 1 and by taking derivatives and equating to 0 the minimum is at x1= −1/2. Using theLagrange technique:L(x1, x2, α1, α2) = x21+ x22+ x23+ α1(x2− x1− 1) + α2(x3− 1)The derivative with respect to x1gives: 2x1− α1= 0The derivative with respect to x2gives: 2x2+ α1= 0The derivative with respect to x3gives: 2x3+ α2= 0The derivative with respect to α1gives: x2− x1− 1 = 0The derivative with respect to α2gives: x3− 1 = 0Treating this as a system of 5 equations with 5 unknowns we get the following solution:x1= −1/2, x2= 1/2, , x3= 1, α1= −1, α2= −2The dual problemThere is another way of using the Lagrangian to solve the constrained optimization problem. The followingresult can be proved:minx subject to the equality constraintsf(x) = maxα1,...,αkminxL(x, α1, . . . , αk)To use this, we begin by treating the Lagrangian multipliers as constants, minimizing the Lagrangian withrespect to x. This is done by taking the derivatives only with respect to x. The resulting system of linearequations can be used to solve for the variables x in terms of the Lagrangian multipliers. This is thensubstituted back into the Lagrangian, transforming it into a function of only the Lagrangian multipliers.This function is called the dual problem, and it can then be solved using unconstrained techniques.Example 3To convert the problem of Example 2 into a dual form we only need to take the derivatives of the Lagrangianwith respect to x1, x2, x3As above:The derivative with respect to x1gives: 2x1− α1= 0The derivative with respect to x2gives: 2x2+ α1= 0The derivative with respect to x3gives: 2x3+ α2= 02Solving for x1, x2in terms of α1, α2we have:x1= α1/2, x2= −α1/2, x3= −α2/2 (1)Substituting back into the Lagrangian gives:L(α1, α2) = −12α21−14α22− α1− α2The dual problem requires that we maximize this expression The maximizing α1, α2are: α1= −1, α2= −2,and the minimizing x1, x2, x3of the original (primal) problem are computed from (1).Example 4Consider the n dimensional hyperplane (generalization of a line in 2D) w0x = s. In 2D it is the linew1x1+ w2x2= s. Compute the distance of this hyperplane from the origin.A point x on the hyperplane has distance of |x| from the origin. We are looking for the point x that minimizes|x|2subject to the constraint w0x = s. The Lagrangian:L(x, α) = |x|2+ α(w0x − s)Computing the derivative with respect to x and equating to 0 we have:2x + αw = 0 ⇒ x = −α2wSubstituting into the Lagrangian we get the dual problem:L(α) =α24|w|2+ α(−α2w0w − s) = −α24|w|2− αsIn terms of α this is a parabola with maximum at α = −2s/|w|2. This gives: x = sw/|w|2so that thedistance of the hyperplane from the origin is |s|/|w|.3Linear inequality constraintsHere we would like to compute the minimum of f(x) where x is subject to linear inequality constraints. Alinear inequality constraint can be written as c0x ≤ d where c is a vector and d is a scalar. When there arek constraints we write them as c0ix ≤ di.The Lagrangian in this case is the same as in the previous case:L(x, α1, . . . , αk) = f(x) +kXi=1αi(c0ix − di)And the following dual theorem holds:minx subject to the inequality constraintsf(x) = maxα1,...,αksubject to αj≥0minxL(x, α1, . . . , αk)Formally, the problemminx subject to the inequality constraintsf(x)is called the primal problem andmaxα1,...,αksubject to αj≥0minxL(x, α1, . . . , αk)is called the dual problem. Observe that here the dual problem is no longer trivial.The following important result can be proved for the solution to the dual problem:The Karush-Kuhn-Tucker complementarity condition:αi(c0ix − di) = 0 for i = 1, . . . , kThis means that either the constraint holds as an equality constraint, or the


View Full Document

UT Dallas CS 6375 - SVM-Optimization-review

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download SVM-Optimization-review
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view SVM-Optimization-review and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view SVM-Optimization-review 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?