1Lecture-16Lemma 5.6 & Theorem 5.7Lemma 5.6Suppose that the Algorithm 5.4 is implemented with a step lengthsuch that it satisfies strong wolf conditions with 0<c2<1/2. Then the method generates the descent directions pkthat satisfies the following inequalities: 10 ,112||||112222K,kccfpfckkTk=∀−−≤∇∇≤−−210 ,111222<<−<−−<− cc210 ,01121222<<<−−<− ccc(B)2Induction, k=0: 10 ,112||||112222K,kccfpfckkTk=∀−−≤∇∇≤−−1||||||||20002000−=∇∇∇−=∇∇ffffpfTTSo by using (B), it is easy to see the both inequalities are satisfied. Assume holds for k.;111 kkkkpfp++++−∇=βAlgorithm (5.4);111111 kTkkkTkkTkpfffpf++++++∇+∇−∇=∇β21112111||||1||||++++++∇∇+−=∇∇kkTkkkkTkfpffpfβ210 ,111222<<−<−−<− cc210 ,01121222<<<−−<− ccc(B)Proof211112111||||1||||+++++++∇∇∇∇∇∇+−=∇∇kkTkkTkkTkkkTkfpffffffpf21112111||||1||||++++++∇∇+−=∇∇kkTkkkkTkfpffpfβ;111kTkkTkFRkffff∇∇∇∇←+++β212111||||1||||kkTKkkTkfpffpf∇∇+−=∇∇++++||||21 kTkkTkpfcpf ∇≤∇+pkfcpkfpkfcTkTkTk∇−≤∇≤∇+ 212(C)Wolf’s strong condition (2) 10 ,112||||112222K,kccfpfckkTk=∀−−≤∇∇≤−−3kTkkTkkTkpfcpfpfc ∇−≤∇≤∇+ 212222122||||||||||||kkTkkkTkkkTkfpfcfpffpfc∇∇−≤∇∇≤∇∇+222122||||1||||1||||1kkTkkkTkkTkfpfcfpffpkfc∇∇−−≤∇∇+−≤∇∇+−+212111||||1||||kkTKkkTkfpffpf∇∇+−=∇∇++++22211122||||1||||||||1kkTkkkTkkkTkfpfcfpffpfc∇∇−−≤∇∇≤∇∇+−+++ 10 ,112||||112222K,kccfpfckkTk=∀−−≤∇∇≤−−From (C)22211122||||1||||||||1kkTkkkTkkkTkfpfcfpffpfc∇∇−−≤∇∇≤∇∇+−+++2221112211||||11ccfpfcckkTk−+−≤∇∇≤−−−+++From inductionQED2221112112||||11ccfpfckkTk−−≤∇∇≤−−+++ 10 ,112||||112222K,kccfpfckkTk=∀−−≤∇∇≤−−4FR and PR• We can use Lemma 5.6 to explain weakness of FR-GC method– If the method generates a bad direction and a tiny step, then subsequent directions will also be bad• On the other hand PR-GC does not have that problem|||| ||||coskkkTkkfpfp∇∇−=θ 10 ,||||||||112||||||||||||||||||||112222K,kpfccpffpfpfckkkkkkTkkk=∀∇−−≤∇∇∇≤∇−− 10 ,112||||112222K,kccfpfckkTk=∀−−≤∇∇≤−−We know the angle between steepest descent and the direction is given by:Now from Lemma 5.6FR 10 ,||||||||112||||||||||||||||11222K,kpfccpfpfpfckkkkkTkkk=∀∇−−≤∇∇≤∇−− 10 ,|||||||| cos |||||||| 21K,kpfpfkkkkk=∀∇≤≤∇χθχ5FR|||| ||||coskkkTkkfpfp∇∇−=θ 10 ,|||||||| cos |||||||| 21K,kpfpfkkkkk=∀∇≤≤∇χθχ0cos ≈kθiff||||||||kkpf<<∇Since pkis almost orthogonal to the gradient, the step from Xkto Xk+1is tiny i.e. kkkkffxx∇≈∇≈++11kTkkTkkApppf∇−=α(G)FR1111≈∇∇∇∇=+++kTkkTkFRkffffβ; 1 kkpp≈+;111 kFRkkkpfp++++−∇←β|||||||kkpf<<∇So the new direction will improve little on the previous one.6PRkTkkkTkPRkfffff∇∇∇−∇∇←+++)(111β01=+PRkβ;111 kPRkkkpfp++++−∇←β11 ++−∇=kkfpThe new search direction will be close to the steepest descent; Restart!kkkkffxx∇≈∇≈++11ExampleProblem, n=100: is order of are order of FR-CG takes thousands of iterations (improves with restarts)PR-CG 37 iterations kθcos||||1−−kkxx210−210−7Convergence of Line Search Methods (Theorem 3.2) (Theorem 5.7)|||||||coskkkTkkpfpf∇∇−=θ∑≥∞<∇022||||coskkkfθThe angle between pkand steepest descent direction If step length satisfies the Wolf’s conditions, the function is continuously differentiable, the gradient is Lipschitz continuous then:Tkf∇−Convergence of Line Search MethodskTkkTkkpfcpff ∇−≥∇−∇+)1()(2121|||| )(kkkTkkpLpffα≤∇−∇+kkkkpxxα+=+1Curvature conditionIteration schemeTherefore)1,( ,) ()(122ccpxfcppxfkkTkkTkk∈∇≥+∇αLipschitz continuous||~||||)~()(|| xxLxfxf−≤∇−∇(1)(2)8Convergence of Line Search Methods21|||| )(kkkTkkpLpffα≤∇−∇+kTkkTkkpfcpff ∇−≥∇−∇+)1()(2122||||1kkTkkppfLc ∇−≥α(3)(1)Combining (1) and (3)kkkTkkpLpffα≤∇−∇+ ||||)(2121||||)(kkTkkkpLpff ∇−∇≥+αConvergence of Line Search Methods22211||||)(1kkTkkkppfLccff∇−−≤+Lcccfcffkkkk/)1( ,||||cos21221−=∇−≤+θ20201||||cosjkjjkfcff ∇−≤∑=+θkTkkkkkpfcxfpxf ∇+≤+αα1) ()(22||||1kkTkkppfLc ∇−≥αSufficient decreaseThereforekTkkkkkpfcxfpxf ∇−−≤+ )() ()(1αα9Convergence of Line Search Methods20201||||cosjkjjkfcff ∇−≤∑=+θ∑≥∞<∇022||||coskkkfθSince f is bounded below, we have f0-fk+1is less than some positive constant for all kTaking the limits:Convergence of Line Search Methods∑≥∞<∇022||||coskkkfθ0||||cos22→∇kkfθWe can be sure that gradient norms converges to zero, provided that the search directions are never too close to orthogonality with the gradientTherefore, the steepest descent produces a gradient sequence that converges to zero, provided that it uses a line search satisfying Wolf’s conditions.We can not guarantee that the method converges to a minimizer, but only that it is attracted by stationary points.If angle is bounded awayFrom 9000cos
View Full Document