15.093 Optimization Methods Lecture 20: The Conjugate Gradient Algorithm Optimality conditions for constrained optimization� 1 Outline Slide 1 1. The Conjugate Gradient Algorithm 2. Necessary Optimality Conditions 3. Sufficient Optimality Conditions 4. Convex Optimization 5. Applications 2 The Conjugate Gradient Algorithm 2.1 Quadratic functions Slide 2 1 min f(x) = x �Qx + c � x 2 Definition: d1, . . . , dn are Q-conjugate if di � 0, iQdj = 0, i � j= d� = Proposition: If d1, . . . , dn are Q-conjugate, then d1, . . . , dn are linearly inde-pendent. 2.2 Motivation Slide 3 Given Q-conjugate d1, . . . , dn, and xk, compute min f(x k + αdk) = c�x k + αc�dk+ α 1 (x k + αdk)�Q(x k + αdk) = 21 f(x k) + α�f(x k)�dk + α2d� Qdk2 kSolution: αˆk = −�f(xk)�dk , x k+1 = x k + ˆαkdkd� kQdk 2.3 Expanding Subspace Theorem Slide 4 d1, . . . , dn are Q-conjugate. Then, xk+1 solves min f(x) k s.t. x = x1 + αjdj j=1 Moreover, xn+1 = x ∗ . 12.4 The Algorithm Slide 5 Step 0 Given x1, set k := 1, d1 = −�f (x0) Step 1 For k = 1, . . . , n do: If ||�f(xk)|| ≤ �, stop; else: −�f(xk)�dkαˆk = argminαf(x k + αdk) = d� kQdk x k+1 = x k + ˆαkdk k+1) + λkdk, −�f(xk+1)�Qdkdk+1 = −�f(x λk = d� kQdk 2.5 Correctness Slide 6 Theorem: The directions d1, . . . , dn are Q-conjugate. 2.6 Convergence Properties Slide 7 • This is a finite algorithm. • If there are only k distinct eigenvalues of Q, the CGA finds an optimal solution in k steps. • Idea of pre-conditioning. Consider 1 min f(Sx) = (Sx)�Q(Sx) + c�Sx 2 so that the number of distinct eigenvalues of S�QS is small 2.7 Example Slide 8 1 f(x) = x �Qx − c � x 2 ⎛ 35 19 22 28 16 3 16 6 4 4 ⎞ ⎛ −1 ⎞ 19 43 33 19 5 2 5 4 0 0 0 ⎜ ⎟ ⎜⎟ ⎜ 22 33 40 29 12 7 6 2 2 4 ⎟ ⎜ 0 ⎟ ⎜ 28 19 29 39 16 7 14 6 2 4 ⎟ ⎜ −3 ⎟ ⎜ ⎟⎜⎟ ⎜ 16 5 12 16 12 4 8 2 4 8 ⎟⎜ 0 ⎟ Q = and c = ⎜ 3 2 7 7 4 5 1 0 1 4 ⎟⎜ −2 ⎟ ⎜ ⎟⎜⎟ ⎜ 16 5 6 14 8 1 12 2 2 4 ⎟⎜ 0 ⎟ ⎜ ⎟⎜ ⎟6 4 2 6 2 0 2 4 0 0 −6 ⎝ ⎠ ⎝⎠ 4 0 2 2 4 1 2 0 2 4 −7 4 0 4 4 8 4 4 0 4 16 −4 ��2 κ(Q) ≈ 17, 641 δ = κ(Q)−1 = 0.999774 Slide 9 κ(Q)+1 2k f(x k) f(x k) − f(x ∗) f(x k) − f(x ∗) f(xk−1) − f(x ∗) 1 12.000000 2 8.758578 3 1.869218 4 −12.777374 5 −30.479483 6 −187.804367 7 −309.836907 8 9 10 11 12 −408.590428 −754.887518 −2567.158421 −2581.711672 −2581.726852 2593.726852 2590.485430 2583.596069 2568.949478 2551.247369 2393.922485 2271.889945 2173.136424 1826.839334 14.568431 0.015180 −0.000000 2.8 General problems 11.000000 0.998750 0.997341 0.994331 0.993109 0.938334 0.949024 0.956532 0.840646 0.007975 0.001042 −0.000000 Slide 10 −�f(xk)�dk d� Qdkkx = x k + ˆαkdk dk+1 = −�f (x k+1) + λkdk ||�f(x k+1)|| λk = ||�f(xk)|| Step 2 k ← k + 1 , goto Step 1 3 Necessary Optimality Cond itions 3.1 Nonlinear Optimization Slide 11 min f(x) s.t. gj(x) ≤ 0, j = 1, . . . , p hi(x) = 0, i = 1, . . . , m P = {x| gj(x) ≤ 0, j = 1, . . . , p, hi(x) = 0, i = 1, . . . , m} 3 Step 0 Given x , s e t k := 1, d1 = −�f (x0) Step 1 If ||�f(xk)|| ≤ �, stop; else: αˆk = argminαf(x k + αdk) = k+1� � � � � � 3.2 The KKT conditions Slide 12 Discovered by Karush-Kuhn-Tucker in 1950’s. Theorem If • x is local minimum of P • I = {j| gj(x) = 0}, set of tight c onstraints • Constraint qualification condition (CQC): The vectors �gj(x), j ∈ I and �hi(x), i = 1, . . . , m, are linearly independent Slide 13 Then, there exist vectors (u, v): pm 1. �f(x) + uj�gj(x) + vi�hi(x) = 0 j=1 i=1 2. u ≥ 0 3. ujgj(x) = 0, j = 1, . . . , p 3.3 Some Intuition from LO Slide 14 Linearize the functions in the neighborhood of the solution x. Problem becomes: min f (x) + �f(x)�(x − x) s.t. gj(x) + �gj(x)�(x − x) ≤ 0, j ∈ I hi(x) + �hi(x)�(x − x) = 0, i = 1, . . . , m Slide 15 This is a LO problem. Dual feas ibility: m uˆj�gj(x) + vˆi�hi(x) = �f (x), uˆj ≤ 0 j∈Ii=1 Change to uj = −uˆj, vi = −vˆi to obtain: pm �f(x) + uj�gj(x) + vi�hi(x) = 0, uj ≥ 0 j=1 i=1 3.4 Example 1 Slide 16 min f (x) = (x1 − 12)2 + (x2 + 6)2 s.t. h1(x) = 8x1 + 4x2 − 20 = 0 g1(x) = x12 + x22 + 3x1 − 4.5x2 − 6.5 ≤ 0 g2(x) = (x1 − 9)2 + x22 − 64 ≤ 0 4� � � � � � � � � � �x = (2, 1)�; g1(x) = 0, g2(x) = −1 4, h1(x) = 0. Slide 17 • I = {1} • �f(x) = (−2 0, 14)�; �g1(x) = (7, −2.5)� • �g2(x) = (−1 4, 2)�; �h1(x) = (8, 4)� • u1 = 4, u2 = 0, v1 = −1 • �g1(x), �h1(x) linearly independent • �f(x) + u1�g1(x) + u2�g2(x) + v1�h1(x) = 0 −20 7 −14 8 0 + 4 + 0 + (−1) = 14 −2.52 4 0 3.5 Example 2 Slide 18 max x�Qx s.t. x�x ≤ 1 Q arbitrary; Not a convex optimization problem. min −x�Qx s.t. x�x ≤ 1 3.5.1 KKT Slide 19 −2Qx + 2ux = 0 x x ≤ 1 u ≥ 0 u(1 − x�x) = 0 Slide 20 3.5.2 Solutions of KKT • x = 0, u = 0, Obj = 0. • x �= 0 ⇒ Qx = u x ⇒ x eigenvector of Q with non-negative eigenvalue u. • x �Qx = u x � x = u. • Thus, pick the largest nonnegative eigenvalue ˆu of Q. The solution is the corresponding eigenvector xˆnormalized such that xˆ�xˆ= 1. If all eigenvalues are negative, xˆ= 0. 5� � � � � � � � � � 3.6 Are CQC Necessary? Slide 21 min x1 s.t. x12 − x2 ≤ 0 x2 = 0 Feasible space is (0, 0). KKT: 1 2x1 0 0 + u + v = 0 −1 1 0 KKT multipliers do not exist, while still (0, 0)� is local minimum. Check �g1(0, 0) and �h1(0, 0). …
View Full Document