Model Problem: Solving Poisson’s equation for temperatureThe Landscape of Ax=b SolversComplexity of linear solversCS 240A: Solving Ax = b in parallelSlide 5Conjugate gradient iteration for Ax = bSlide 7Slide 8Slide 9Slide 10Conjugate gradient iterationConjugate gradient: Krylov subspacesConjugate gradient: Orthogonal sequencesConjugate gradient: ConvergenceOther Krylov subspace methodsto complexity slides…Slide 17Conjugate gradient primitivesModel Problem: Solving Poisson’s equationStencil computationsParallelism in Stencil ComputationsWhere’s the data?Complexity measures for parallel computationGhost Nodes in Stencil ComputationsDetailed complexity measures for data movement IModel Problem: Solving Poisson’s equation for temperatureModel Problem: Solving Poisson’s equation for temperaturek = n1/3•For each i from 1 to n, except on the boundaries:– x(i-k2) – x(i-k) – x(i-1) + 6*x(i) – x(i+1) – x(i+k) – x(i+k2) = 0•n equations in n unknowns: A*x = b•Each row of A has at most 7 nonzeros•In two dimensions, k = n1/2 and each row has at most 5 nzsThe Landscape of Ax=b SolversThe Landscape of Ax=b Solvers Pivoting LU GMRES, BiCGSTAB, … Cholesky Conjugate gradient DirectA = LUIterativey’ = AyNon-symmetricSymmetricpositivedefiniteMore Robust Less Storage (if sparse)More RobustMore GeneralComplexity of linear solversComplexity of linear solvers2D 3DSparse Cholesky: O(n1.5 ) O(n2 )CG, exact arithmetic: O(n2 ) O(n2 )CG, no precond: O(n1.5 ) O(n1.33 )CG, modified IC: O(n1.25 ) O(n1.17 )CG, support trees: O(n1.20 ) -> O(n1+ ) O(n1.75 ) -> O(n1+ ) Multigrid: O(n) O(n)n1/2n1/3Time to solve model problem (Poisson’s equation) on regular meshCS 240A: Solving Ax = b in parallelCS 240A: Solving Ax = b in parallel•Dense A: Gaussian elimination with partial pivoting (LU)•See Jim Demmel’s slides•Same flavor as matrix * matrix, but more complicated•Sparse A: Iterative methods – Conjugate gradient, etc.•Sparse matrix times dense vector•Sparse A: Gaussian elimination – Cholesky, LU, etc.•Graph algorithms•Sparse A: Preconditioned iterative methods and multigrid•Mixture of lots of thingsCS 240A: Solving Ax = b in parallelCS 240A: Solving Ax = b in parallel•Dense A: Gaussian elimination with partial pivoting•See Jim Demmel’s slides•Same flavor as matrix * matrix, but more complicated•Sparse A: Iterative methods – Conjugate gradient etc.•Sparse matrix times dense vector•Sparse A: Gaussian elimination – Cholesky, LU, etc.•Graph algorithms•Sparse A: Preconditioned iterative methods and multigrid•Mixture of lots of thingsConjugate gradient iteration for Ax = bConjugate gradient iteration for Ax = bx0 = 0 approx solutionr0 = b residual = b - Axd0 = r0 search directionfor k = 1, 2, 3, . . .xk = xk-1 + … new approx solution rk = … new residual dk = … new search directionConjugate gradient iteration for Ax = bConjugate gradient iteration for Ax = bx0 = 0 approx solutionr0 = b residual = b - Axd0 = r0 search directionfor k = 1, 2, 3, . . .αk = … step lengthxk = xk-1 + αk dk-1 new approx solution rk = … new residual dk = … new search directionConjugate gradient iteration for Ax = bConjugate gradient iteration for Ax = bx0 = 0 approx solutionr0 = b residual = b - Axd0 = r0 search directionfor k = 1, 2, 3, . . .αk = (rTk-1rk-1) / (dTk-1Adk-1) step lengthxk = xk-1 + αk dk-1 new approx solution rk = … new residual dk = … new search directionConjugate gradient iteration for Ax = bConjugate gradient iteration for Ax = bx0 = 0 approx solutionr0 = b residual = b - Axd0 = r0 search directionfor k = 1, 2, 3, . . .αk = (rTk-1rk-1) / (dTk-1Adk-1) step lengthxk = xk-1 + αk dk-1 new approx solution rk = … new residualβk = (rTk rk) / (rTk-1rk-1)dk = rk + βk dk-1 new search directionConjugate gradient iteration for Ax = bConjugate gradient iteration for Ax = bx0 = 0 approx solutionr0 = b residual = b - Axd0 = r0 search directionfor k = 1, 2, 3, . . .αk = (rTk-1rk-1) / (dTk-1Adk-1) step lengthxk = xk-1 + αk dk-1 new approx solution rk = rk-1 – αk Adk-1 new residualβk = (rTk rk) / (rTk-1rk-1)dk = rk + βk dk-1 new search directionConjugate gradient iterationConjugate gradient iteration•One matrix-vector multiplication per iteration•Two vector dot products per iteration•Four n-vectors of working storagex0 = 0, r0 = b, d0 = r0for k = 1, 2, 3, . . .αk = (rTk-1rk-1) / (dTk-1Adk-1) step lengthxk = xk-1 + αk dk-1 approx solution rk = rk-1 – αk Adk-1 residualβk = (rTk rk) / (rTk-1rk-1) improvementdk = rk + βk dk-1 search directionConjugate gradient: Krylov subspacesConjugate gradient: Krylov subspaces•Eigenvalues: Av = λv { λ1, λ2 , . . ., λn}•Cayley-Hamilton theorem: (A – λ1I)·(A – λ2I) · · · (A – λnI) = 0 Therefore Σ ciAi = 0 for some ciso A-1 = Σ (–ci/c0) Ai–1 •Krylov subspace:Therefore if Ax = b, then x = A-1 b andx span (b, Ab, A2b, . . ., An-1b) = Kn (A, b) 0 i n1 i nConjugate gradient: Orthogonal sequencesConjugate gradient: Orthogonal sequences•Krylov subspace: Ki (A, b) = span (b, Ab, A2b, . . ., Ai-1b) •Conjugate gradient algorithm:for i = 1, 2, 3, . . .find xi Ki (A, b) such that ri = (b – Axi) Ki (A, b)•Notice ri Ki+1 (A, b), so ri rj for all j < i•Similarly, the “directions” are A-orthogonal:(xi – xi-1 )T·A· (xj – xj-1 ) = 0•The magic: Short recurrences. . .A is symmetric => can get next residual and direction from the previous one, without saving them all.Conjugate gradient: ConvergenceConjugate gradient: Convergence•In exact arithmetic, CG converges in n steps (completely unrealistic!!)•Accuracy after k steps of CG is related to:•consider polynomials of degree k that are equal to 1 at 0.•how small can such a polynomial be at all the eigenvalues of A?•Thus, eigenvalues close together are good.•Condition number: κ(A) = ||A||2 ||A-1||2 = λmax(A) / λmin(A)•Residual is reduced by a constant factor by O(κ1/2(A)) iterations of CG.Other Krylov subspace methodsOther Krylov subspace methods•Nonsymmetric linear systems:•GMRES: for i = 1, 2, 3, . . . find xi Ki (A, b) such that
View Full Document