New version page

# State-of-the-art numerical solution

## This preview shows page 1-2-3-25-26-27-28-50-51-52 out of 52 pages.

View Full Document

End of preview. Want to read all 52 pages?

View Full Document
Unformatted text preview:

State-of-the-art numerical solution of large Hermitian eigenvalue problemsAndreas StathopoulosComputer Science Department and Computational Sciences ClusterCollege of William and MaryAcknowledgment: National Science FoundationThe problemFind numEvals eigenvalues˜λiand corresponding eigenvectors ˜xiA˜xi=˜λi˜xi, i = 1 : numEvalsA is large, sparse, symmetricN = O(106− 108)Applications: materials, structural, data mining, SVD, QCD, ...QCDAccelerate linear systems with multiple right hand sidesLow rank approximation of matricesOnly possible through iterative methods[ 2 ]Power method: the fundamental iterative methodGiven initial guess v0, the iterationfor i = 1,2,...t = Avi−1vi= t/ktkconverges to the largest modulus eigenpair (˜λN, ˜xN), i.e.,Aiv0kAiv0k−→ ˜xN, with rate˜λN−1˜λN+ Requires only matrix-vector multiplications− Only for largest eigenpair− Slow![ 3 ]Krylov methods: the prevailing techniqueKrylov space consists of the span of all power iterates:Km,v= spanv,Av,A2v,...,Am−1v={p(A)v : ∀p polynomial of degree < m}Compute V an orthonormal basis for Km,v(for numerical stability)Compute approximations through Rayleigh-Ritz:xi= Vyi, where VTAVyi= λiyiArnoldi: the above process for non-symmetric matricesLanczos: a special case of Arnoldi for symmetric matrices[ 4 ]Krylov methods: the prevailing techniqueKm,v= spanv,Av,A2v,...,Am−1v={p(A)v : ∀p polynomial of degree < m}+ Approximating extreme eigenpairs+ Converges trivially in N steps+ Optimal approximations over all polynomials− Convergence rate depends on relative separation of eigenvalues− Slow for clustered eigenvalues and large sizes− O(Nm2) orthogonalization cost, O(mN) storage[ 5 ]Krylov ideal for linear systemsAx = bConjugate Gradient (CG) uses a 3-term recurrence to build Km,vand update theapproximate solution.• O(Nm) cost and O(3N) storage• minimizes kerrorkAat every step• Preconditioning with M−1≈ A−1also easy: M−1Ax = M−1b (PCG)[ 6 ]Krylov ideal for linear systemsAx = bConjugate Gradient (CG) uses a 3-term recurrence to build Km,vand update theapproximate solution.• O(Nm) cost and O(3N) storage• minimizes kerrorkAat every step• Preconditioning with M−1≈ A−1also easy: M−1Ax = M−1b (PCG)Note: The action of M−1could be an iterative method itself![ 7 ]Lanczos problems• Lanczos 3-term recurrence still requires O(Nm) storage• Unlike CG, orthogonality is important in Lanczos⇓• Restarting to limit the basis size destroys optimality• Preconditioning is not obvious (M−1Ax = λM−1x not an eigenproblem)Goal: Use PCG to derive nearly optimal eigenmethods with smaller bases[ 8 ]Generalized Davidson: Eigenvalue PreconditioningLet r = Ax− λx the residual of an approximate eigenpair (λ,x)Arnoldi/Lanczos: expand basis V by r.Generalized Davidson: expands by the preconditioned r:Generalized DavidsonrepeatV = [V, M−1r] append preconditioned residualV∗AVy = λy, x = Vy Rayleigh Ritzx = x/kxk normalizer = Ax− λx new residualuntil krk < εNo 3-term recurrence, more expensive step, but much faster convergence[ 9 ]Inverse iteration (inverse power method)Given initial guess v0, the iterationfor i = 1,2,...t = (A− σI)−1vi−1vi= t/ktkconverges to the eigenpair closest to σ+ The closer σ is to˜λkthe faster the outer convergence rate˜λk−σ˜λk−1−σ− A direct factorization of A may be prohibitive− An iterative method for the linear system may take long to converge[ 10 ]Rayleigh Quotient IterationGiven initial guess v0:for i = 1,2,...t = (A− σI)−1vi−1vi= t/ktkσ = vTi−1Avi−1All the characteristics of Inverse Iteration but also:+ converges to the eigenpair cubically!!− if v0not close to the required eigenvector it may misconverge[ 11 ]Rayleigh Quotient IterationGiven initial guess v0:for i = 1,2,...t = (A− σI)−1vi−1vi= t/ktkσ = vTi−1Avi−1All the characteristics of Inverse Iteration but also:+ converges to the eigenpair cubically!!− if v0not close to the required eigenvector it may misconvergeEigenproblem: constrained minimization of Rayleigh quotient λ =xTAxxTxRQI equivalent to Newton on the unit-sphere manifold[ 12 ]Inexact RQI, Newton, and Jacobi-Davidson(A− σI)t = vi−1must be solved accurately enough for RQI to convergeHowever, inexact (truncated) Newton does not require high accuracyNewton: xi+1= xi− Hess(xi)−1∇(xi) computes correctionRQI: xi+1= (A− σI)−1xiupdates approximation⇓inexact RQI not exactly inexact Newton![ 13 ]Inexact RQI, Newton, and Jacobi-Davidson(A− σI)t = vi−1must be solved quite accurately for RQI to convergeHowever, inexact (truncated) Newton does not require high accuracyNewton: xi+1= xi− Hess(xi)−1∇(xi) computes correctionRQI: xi+1= (A− σI)−1xiupdates approximation⇓inexact RQI not exactly inexact Newton!Note ∇(x) = −r = −(Ax− λx) the residual of (λ,x). Thus the correction δ to x:Jacobi-Davidson: (I − xxT)(A− ηI)(I − xxT)δ = r computes correctionJD ⇔ inexact (truncated) Newton[ 14 ]Equivalence in the general caseLet M ≈ A− σI a preconditionerBoth GD/JD solve approximately the correction equation:Generalized Davidson as δ = M−1rJacobi Davidson as δ = M−1|x⊥rBoth GD/JD not single vector iterations, they build a space!GD, JD ⇐⇒ subspace-accelerated inexact Newton[ 15 ]A different view: Quasi-Newton approachesMild non-linearity:• Nonlinear CG is competitive [Bradbury & Fletcher, ’66, Others]• Better: locally optimal LOBPCG [D’yakonov ’83, Knyazev, ’91, ’01]xi+1= Rayleigh Ritzxi−1, xi, M−1ri[ 16 ]A different view: Quasi-Newton approachesMild non-linearity:• Nonlinear CG is competitive [Bradbury & Fletcher, ’66, Others]• Better: locally optimal LOBPCG [D’yakonov ’83, Knyazev, ’91, ’01]xi+1= Rayleigh Ritzxi−1, xi, M−1ri• Subspace acceleration and recurrence restarting in GD [Murray et al., ’92]• GD(k,m)+1: Restart with [xi−1, x1i,...,xki][AS ’98, ’99]Direct analogy to limited memory quasi Newton methods:GD+1 accelerates LOBCPG ←→ Broyden accelerates Nonlinear CG[ 17 ]So what is optimal? Work unit: Matrix-VectorQMRoptTotal MVs*Newtoninner iterationsRQILanczosOptimal: Unrestarted Lanczos or QMRopt, QMR solving (A−˜λI)x = 0[ 18 ]So what is optimal? Work unit: Matrix-VectorQMRoptTotal MVsLanczos*Truncated Newton Newtoninner iterationsJacobi−Davidson RQIInexact RQI[ 19 ]So what is optimal? Work unit: Matrix-VectorQMRoptTotal

Unlocking...