Unformatted text preview:

Iterative methods and preconditioners Victor Eijkhout eijkhout cs utk edu ICL University of Tennessee CS 594 March 2003 Overview Theory of iterative methods Computational kernels Preconditioners Pro con iterative or direct methods Against direct Variants of Gaussian elimination Time and space demand can be huge No use of analytical properties Pro iterative Use analytical properties of systems Less time and space Possibly Pro direct Predictable in advance Solution up to round off Against iterative Unpredictable Solution not guaranteed Iterative methods Stationary methods Iterative refinement To solve Ax b Suppose M A and solve M x1 b Error e1 x1 x residual r1 Ax1 b Since x A 1 b x1 A 1 r1 define x2 x1 M 1 r1 and x3 x2 M 1 r2 et cetera Analysis of stationary methods Basic step r2 Ax2 b A x1 M 1 r1 b I AM 1 r1 Inductively rn I AM 1 n 1 r1 so convergence if all I AM 1 1 Examples of stationary iterative methods Jacobi method M DA diag A Gauss Seidel method M DA LA SOR method M DA LA These methods converge for M matrices A positive definite aij 0 for i 6 j Steepest descent Iterative process for Ax b xi 1 xi M 1 ri Slight generalization xi 1 xi i M 1 ri line search with optimal i steepest descent Derivation of Krylov space methods Stationary methods rn I AM 1 rn 1 rn I AM 1 n 1 r1 Try more general with M I rn Pn 1 A r1 or define the Krylov sequence k1 r1 kn 1 Akn and set rn span k1 kn span r1 Ar1 A2 r1 An r1 or equivalently rn span r1 rn 1 Arn Acceleration In the definition rn Pn 1 A r1 choose the polynomial coefficients Two choices for polynomial coefficients Get Pn as close to zero as possible Pn 0 1 consistency condition knowledge of spectrum required Chebyshev polynomials Make residuals mutually orthogonal under some inner product Orthogonality Making error orthogonal to generated subspace projecting error on subspace minimising error on subspace Prerequisite inner product matrix needs to be definite the further from elliptic you get the worse an iterative method will perform or change your inner product Properties of orthogonalising methods Theoretically convergence in N steps N matrix dimension In practice error decreases though not monotonically Theoretically iterands are optimal under energy norm In practice not optimal under L2 norm p Monotonic bound it A The matter of Symmetry Question does rn span r1 rn 1 imply that all rn have to be retained Answer If A is symmetric no CG 3 vectors If A is not symmetric and you want orthogonality yes OrthoDir OrthoRes Practical solutions Restart Truncate If A is not symmetric and you can relax orthogonality no BiCG 5 vectors Optimality Idea take span r1 rn 1 and construct a minimal length affine combination Theoretical optimum obtained exactly MinRes from CG GMRES from OrthoRes Optimum obtained up to small factor QMR from BiCG Long sequences iff the original method has them Speed of convergence Number of iterations bounded by p A or M 1 A p Actual number is complicated function of spectrum Preconditioner reduce condition number by constant or order Use of transpose Some methods for non symmetric matrices need the transpose matrix vector multiplication OrthoRes OrthoDir and GMRES no BiCG and QMR yes Problem for implicitly defined matrix vector product Sketch of CG FOR j zj j pj 1 2 m M 1 rj zjt rj j rj 1 j 1 pj 1 j ptj Apj rj 1 rj xj 1 j j Apj xj jj pj Observations about CG preconditioner matrix not applied as combo inner products introduce synchronization points limited storage needs no coupling of xj and rj sequences Sketch of GMRES FOR j 1 2 m w M 1 Avj FOR i 1 j hij w vi w w hij vi hj 1 j kwk2 vj 1 w kwk2 Solve Hm ym kr0 ke1 x x0 v1 vm ym if necessary repeat Computational aspects of GMRES Matrix vector product Preconditioner solve Inner products linear in iteration Vector updates Computational kernels Sparse matrix vector product do i 1 n s 0 do j ptr i ptr i 1 1 s s a j x idx j end do x i s end do 3 mem refs 2 ops low efficiency little cache reuse Sparse matrix vector product distributed Vector elements referenced in stencil fashion edge variable owned owned variable border variable Sparse matvec distributed cont d Processor subdomain references ghost border region subdomain ghost region Sparse matvec distributed ccont d Matrix vector product has local and remote references A B P0 P1 Sparse matrix vector product algorithm Overlapping communication computation Initiate send recv of boundary data Perform local matvec Complete recv of boundary data Perform matvec with boundary data and update Inner products Inner products are almost unavoidable Chebyshev method needs spectrum information Inner products are bad news in parallel synchronisation points try to bundle them Combine inner products Three term methods combine two inner products per iteration Chronopoulos Gear Saad Meurant D Azevedo Eijkhout Romine can be stable Arnoldi methods use non modified Gramm Schmid GS twice block orthogonalisation on partial Krylov series stability a problem or hide them vd Vorst Advanced iterative methods Reduction GMRES iterations restrict number of GMRES steps by combination with other method 1 further reduction of ri with e g GMRES itself GMRESR VDV and Vuik 94 2 precond of search direction FGMRES Saad 93 GMRESR more memory robust reductions add FGMRES cheaper in memory not robust GMRESR r0 b Ax0 for i 0 1 2 zm approx sol of Az ri nested GMRES c Azm for k 0 1 i 1 zm zm uk c c ck ck c ci c kck2 ui zm kck2 xi 1 xi ci ri ui ri 1 ri ci ri ci FGMRES works with ui Example of GMRESR Navier Stokes GMRES GMRES 50 GMRESR GMRES 10 as solve matvec 184 1220 198 daxpy 17000 35000 1386 ddot 17000 35000 1224 memory 184 50 46 cpu time 7 2 17 0 1 2 Choice of m in GMRESR GMRES 177 it GMRES 50 1323 it GMRESR m m it m it cpu time memory 4 45 180 1 15 94 8 23 184 0 89 54 12 16 192 0 90 44 16 12 192 1 05 40 20 10 200 1 23 40 Bi CG and variants with short recurrencies we can construct xi such that ri Ki AT s0 not optimal in Ki A r0 2 MV s per iteration one with transpose CG like computational overhead CG like memory requirements more iterations than GMRES RES kAxBiCG bk2 kAxGM bk2 i i Variants of BiCG QMR QMR Freund Nachtigal slightly better than Bi CG but not essentially more smooth convergence more iterations than GMRES product with transpose Squared methods For BiCG polynomials it is possible to compute Pn2 A r1 CGS Theoretically double convergence speed in practice irregular No longer need for transpose 2 MV per iteration Motivation to improve CGS Goal smoother faster


View Full Document

UTK CS 594 - Iterative Methods and Preconditioners

Documents in this Course
Load more
Loading Unlocking...
Login

Join to view Iterative Methods and Preconditioners and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Iterative Methods and Preconditioners and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?