Unformatted text preview:

15.093 Optimization Methods Lecture 20: The Conjugate Gradient Algorithm Optimality conditions for constrained optimization� 1 Outline Slide 1 1. The Conjugate Gradient Algorithm 2. Necessary Optimality Conditions 3. Sufficient Optimality Conditions 4. Convex Optimization 5. Applications 2 The Conjugate Gradient Algorithm 2.1 Quadratic functions Slide 2 1 min f(x) = x �Qx + c � x 2 Definition: d1, . . . , dn are Q-conjugate if di � 0, iQdj = 0, i � j= d� = Proposition: If d1, . . . , dn are Q-conjugate, then d1, . . . , dn are linearly inde-pendent. 2.2 Motivation Slide 3 Given Q-conjugate d1, . . . , dn, and xk, compute min f(x k + αdk) = c�x k + αc�dk+ α 1 (x k + αdk)�Q(x k + αdk) = 21 f(x k) + α�f(x k)�dk + α2d� Qdk2 kSolution: αˆk = −�f(xk)�dk , x k+1 = x k + ˆαkdkd� kQdk 2.3 Expanding Subspace Theorem Slide 4 d1, . . . , dn are Q-conjugate. Then, xk+1 solves min f(x) k s.t. x = x1 + αjdj j=1 Moreover, xn+1 = x ∗ . 12.4 The Algorithm Slide 5 Step 0 Given x1, set k := 1, d1 = −�f (x0) Step 1 For k = 1, . . . , n do: If ||�f(xk)|| ≤ �, stop; else: −�f(xk)�dkαˆk = argminαf(x k + αdk) = d� kQdk x k+1 = x k + ˆαkdk k+1) + λkdk, −�f(xk+1)�Qdkdk+1 = −�f(x λk = d� kQdk 2.5 Correctness Slide 6 Theorem: The directions d1, . . . , dn are Q-conjugate. 2.6 Convergence Properties Slide 7 • This is a finite algorithm. • If there are only k distinct eigenvalues of Q, the CGA finds an optimal solution in k steps. • Idea of pre-conditioning. Consider 1 min f(Sx) = (Sx)�Q(Sx) + c�Sx 2 so that the number of distinct eigenvalues of S�QS is small 2.7 Example Slide 8 1 f(x) = x �Qx − c � x 2 ⎛ 35 19 22 28 16 3 16 6 4 4 ⎞ ⎛ −1 ⎞ 19 43 33 19 5 2 5 4 0 0 0 ⎜ ⎟ ⎜⎟ ⎜ 22 33 40 29 12 7 6 2 2 4 ⎟ ⎜ 0 ⎟ ⎜ 28 19 29 39 16 7 14 6 2 4 ⎟ ⎜ −3 ⎟ ⎜ ⎟⎜⎟ ⎜ 16 5 12 16 12 4 8 2 4 8 ⎟⎜ 0 ⎟ Q = and c = ⎜ 3 2 7 7 4 5 1 0 1 4 ⎟⎜ −2 ⎟ ⎜ ⎟⎜⎟ ⎜ 16 5 6 14 8 1 12 2 2 4 ⎟⎜ 0 ⎟ ⎜ ⎟⎜ ⎟6 4 2 6 2 0 2 4 0 0 −6 ⎝ ⎠ ⎝⎠ 4 0 2 2 4 1 2 0 2 4 −7 4 0 4 4 8 4 4 0 4 16 −4 ��2 κ(Q) ≈ 17, 641 δ = κ(Q)−1 = 0.999774 Slide 9 κ(Q)+1 2k f(x k) f(x k) − f(x ∗) f(x k) − f(x ∗) f(xk−1) − f(x ∗) 1 12.000000 2 8.758578 3 1.869218 4 −12.777374 5 −30.479483 6 −187.804367 7 −309.836907 8 9 10 11 12 −408.590428 −754.887518 −2567.158421 −2581.711672 −2581.726852 2593.726852 2590.485430 2583.596069 2568.949478 2551.247369 2393.922485 2271.889945 2173.136424 1826.839334 14.568431 0.015180 −0.000000 2.8 General problems 11.000000 0.998750 0.997341 0.994331 0.993109 0.938334 0.949024 0.956532 0.840646 0.007975 0.001042 −0.000000 Slide 10 −�f(xk)�dk d� Qdkkx = x k + ˆαkdk dk+1 = −�f (x k+1) + λkdk ||�f(x k+1)|| λk = ||�f(xk)|| Step 2 k ← k + 1 , goto Step 1 3 Necessary Optimality Cond itions 3.1 Nonlinear Optimization Slide 11 min f(x) s.t. gj(x) ≤ 0, j = 1, . . . , p hi(x) = 0, i = 1, . . . , m P = {x| gj(x) ≤ 0, j = 1, . . . , p, hi(x) = 0, i = 1, . . . , m} 3 Step 0 Given x , s e t k := 1, d1 = −�f (x0) Step 1 If ||�f(xk)|| ≤ �, stop; else: αˆk = argminαf(x k + αdk) = k+1� � � � � � 3.2 The KKT conditions Slide 12 Discovered by Karush-Kuhn-Tucker in 1950’s. Theorem If • x is local minimum of P • I = {j| gj(x) = 0}, set of tight c onstraints • Constraint qualification condition (CQC): The vectors �gj(x), j ∈ I and �hi(x), i = 1, . . . , m, are linearly independent Slide 13 Then, there exist vectors (u, v): pm 1. �f(x) + uj�gj(x) + vi�hi(x) = 0 j=1 i=1 2. u ≥ 0 3. ujgj(x) = 0, j = 1, . . . , p 3.3 Some Intuition from LO Slide 14 Linearize the functions in the neighborhood of the solution x. Problem becomes: min f (x) + �f(x)�(x − x) s.t. gj(x) + �gj(x)�(x − x) ≤ 0, j ∈ I hi(x) + �hi(x)�(x − x) = 0, i = 1, . . . , m Slide 15 This is a LO problem. Dual feas ibility: m uˆj�gj(x) + vˆi�hi(x) = �f (x), uˆj ≤ 0 j∈Ii=1 Change to uj = −uˆj, vi = −vˆi to obtain: pm �f(x) + uj�gj(x) + vi�hi(x) = 0, uj ≥ 0 j=1 i=1 3.4 Example 1 Slide 16 min f (x) = (x1 − 12)2 + (x2 + 6)2 s.t. h1(x) = 8x1 + 4x2 − 20 = 0 g1(x) = x12 + x22 + 3x1 − 4.5x2 − 6.5 ≤ 0 g2(x) = (x1 − 9)2 + x22 − 64 ≤ 0 4� � � � � � � � � � �x = (2, 1)�; g1(x) = 0, g2(x) = −1 4, h1(x) = 0. Slide 17 • I = {1} • �f(x) = (−2 0, 14)�; �g1(x) = (7, −2.5)� • �g2(x) = (−1 4, 2)�; �h1(x) = (8, 4)� • u1 = 4, u2 = 0, v1 = −1 • �g1(x), �h1(x) linearly independent • �f(x) + u1�g1(x) + u2�g2(x) + v1�h1(x) = 0 −20 7 −14 8 0 + 4 + 0 + (−1) = 14 −2.52 4 0 3.5 Example 2 Slide 18 max x�Qx s.t. x�x ≤ 1 Q arbitrary; Not a convex optimization problem. min −x�Qx s.t. x�x ≤ 1 3.5.1 KKT Slide 19 −2Qx + 2ux = 0 x x ≤ 1 u ≥ 0 u(1 − x�x) = 0 Slide 20 3.5.2 Solutions of KKT • x = 0, u = 0, Obj = 0. • x �= 0 ⇒ Qx = u x ⇒ x eigenvector of Q with non-negative eigenvalue u. • x �Qx = u x � x = u. • Thus, pick the largest nonnegative eigenvalue ˆu of Q. The solution is the corresponding eigenvector xˆnormalized such that xˆ�xˆ= 1. If all eigenvalues are negative, xˆ= 0. 5� � � � � � � � � � 3.6 Are CQC Necessary? Slide 21 min x1 s.t. x12 − x2 ≤ 0 x2 = 0 Feasible space is (0, 0). KKT: 1 2x1 0 0 + u + v = 0 −1 1 0 KKT multipliers do not exist, while still (0, 0)� is local minimum. Check �g1(0, 0) and �h1(0, 0). …


View Full Document

MIT 15 093J - The Conjugate Gradient Algorithm

Download The Conjugate Gradient Algorithm
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Conjugate Gradient Algorithm and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Conjugate Gradient Algorithm 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?