1Lecture-9Conjugate Direction Algorithm(Solution of Linear System orMinimization of A Quadratic Function)Conjugate Gradient• Linear conjugate gradient: for solving linear systems Ax=b with PD matrix, A.– Exact solution in n steps (Hestenes & Stiefel, 1950s)– Approximate solution in fewer than n steps• Non-linear conjugate gradient: for solving large-scale non-linear optimization problems.– Fletcher and Reeves, 1964– Polk-Ribiere, 19692Conjugate GradientOr minimize the following function:bAx=A is symmetric PD.(1)xbAxxxTT−=21)(φ(2))()( xrbAxx=−=∇φr(x) is the residualji 0 ≠∀=jTiApp{}110,,,−=npppS KThe set S is conjugate wrt A ifLinear Independence0then 0 if1210111100=====++−−−nnnpppσσσσσσσKK{}110,,,−=npppS KS is linearly independent Conjugate set is also linearly independent.ji 0 ≠∀=jTiAppTherefore, A has at most n conjugate directions.3Conjugate Direction Methodkkkkpxxα+=+1ji 0 ≠∀=jTiAppLine search1D minimizer of a quadratic functionxbAxxxTT−=21)(φkTkkTkkApppφα∇−=Convergence Rate of Steepest Descent() ()()()() )( 0 0 0)21(kTkkTkkkkTTkkTkkTkTkkTkTkkTkkTkTkkTkkTkTkkkkTkkTkkkkfQfffQgggbQxQgggbQgxgbQgxQgggbQggQgxgbQggxgxbgxQgxddgxfdd∇∇∇∇=−=−=−==++−=+−−==−−−−=−ααααααααααααkkTkkTkkkffQfffxx ∇∇∇∇∇−=+1kkkkfxx∇−=+α1bQxxf−=∇)(From Lecture-54Conjugate Direction MethodkTkkTkTkQgggbQgx −=α )()())((kkkTTkpAppbAx−−−−=αkTkkTkkApppφα∇−=)()( xrbAxx=−=∇φkTkkTkkApppr−=αji 0 ≠∀=jTiAppTheorem 5.1For any x0the sequence {xk} generated by the conjugate direction algorithm, converges to the solution x*of the linear system in at most n steps.• Conjugate vectors• Linearly independent vectors• Sequence {xk}5ProofkTkkTkkApppr−=αkkkkpxxα+=+11111000 −−++++=kkkpppxxαααK1111000 −−+++=−kkkpppxxαααKProof1111000*−−++=−nnpppxxσσσK)()(1111000*−−++=−nnTkTkpppApxxApσσσK)000()(0*+++++=− KKkTkkTkAppxxApσ{}110,,,−=npppS KS is linearly independent Therefore:conjugatekTkTkAppxxAp =− )(0*(A)kTkTkkAppxxAp )(0*−=σ(B)6ProofkTkkTkkApppr−=αkkkkpxxα+=+11111000 −−++++=kkkpppxxαααK1111000 −−+++=−kkkpppxxαααK0AxpAxpTkkTk=kTkkTkkTkTkrpAxbpxxApxxAp −=−=−=− )()()(*0*0)(0=− xxApkTkkTkTkrpxxAp −=− )(0*)()( xrbAxx =−=∇φkTkTkAppxxAp =− )(0*(A)ProofkTkTkrpxxAp −=− )(0*kTkkTkkApppr−=αkkασ=Therefore:QEDkTkTkkAppxxAp )(0*−=σ(B)7Interpretation of Theorem 5.1If A is a diagonal matrix, then we can minimize the 1-D function along coordinate axes in n iterations.Interpretation of Theorem 5.1If A is not a diagonal matrix, then we can not minimize the function along coordinate axes in n iterations.8Transformed ProblemxSx1ˆ−=[]110,,,−=npppS KxbSxASSxxxTTTTˆ)(ˆ)(ˆ21)()ˆ(ˆ−==ϕϕxbAxxxTT−=21)(φBy conjugacy STASis a diagonal matrix.LetwhereNow we can minimize along coordinate directions in transformed space.However, each coordinate direction in transformed space correspond to the conjugate direction in the original space due to xSx1ˆ−=Therefore, we conclude the conjugate direction algorithm converges in n steps.xcxDxxxTTˆ)(ˆˆ21)()ˆ(ˆ−==ϕϕWhen Hessian is diagonal, each coordinate minimization correctly determines one of the components of the solution x*.Therefore, after k 1-D minimizations, the quadratic has been minimized on the subspace spanned by e1,e2,….,ek.9Theorem 5.2 Let x0be any starting point and suppose that the sequence {xk} is generated by the conjugate direction algorithm. Then1,,0for 0 −== kipriTkKand xkis minimizer of over the set xbAxxxTT−=21)(φ{}},,{|100 −+=kppspanxxx K(3)1,,0for 0)~( −== kipxriTK{}},,{|100 −+=kppspanxxx K)()(11000 −−+++=kkppxhσσφσK),,(110 −=kσσσσK1,,0 ,0)(*−==∂∂kixhiKσ1,,0 0)(11*0*00−==+++∇−−kipppxiTkkKKσσφFirst show that a point minimizes over the set (3) if and only ifx~φLetWhereSince is strictly convex quadratic, it has a unique minimizer:)(σhChain rule()1,,0 0~−== kipxriTK)()( xrbAxx=−=∇φr(x) is the residualProof10Proof111 −−−+=kkkkAprrα)()( xrbAxx=−=∇φkkkkAprrα+=+1kkkkpxxα+=+1(A)From (A)0001Aprrα+=000001)( pAprprTTα+=0000001AppprprTTTα+=001=prTkTkkTkkApppr−=αBecause1,,0for 0 −== kipriTkKProof111 −−−+=kkkkAprrα0111111=+=−−−−−− kTkkkTkkTkApprprpα0111=+=−−− kTikkTikTiApprprpα2,,0 −=ki K2,,0for 0 −== kipriTkK(A)From (A)inductionDefinition kTkkTkkApppr−=αAndConjugacyTherefore1,,0for 0 −== kipriTkKQEDTrue 2,,0for 01−==−kipriTkK11How do we select conjugate directions• Eigenvalues of A are mutually orthogonal and conjugate wrt to A.• Gram-Schmidt process to produce conjugate directions instead of orthogonal vectors.Basic Properties of the CG1−+−=kkkkprpβ111−−−=kTkkTkkAppAprβEach direction is chosen to be a linear combination of the steepest descent direction and the previous direction.1−+−∇=kkkkppβφ1111 −−−−+−=kTkkTkkkTkAppAprAppβ12Algorithm 5.1)(;1 ; ; ; ; ; 0 0 , , ;Given 1111111100000whileendkkprpAppAprbAxrpxxAppprrWhilekrpbAxrsetxkkkkkTkkTkkkkkkkkkTkkTkkk+←+−←←−←+←−←≠←−←−←++++++++ββααp0is steepest
View Full Document