Iterative Methods in Linear Algebra(part 1)Stan TomovInnovative Computing LaboratoryComputer Science DepartmentThe University of TennesseeWednesday April 8, 2009CS 594, 04-08-2009CS 594, 04-08-2009OutlinePart IDiscussionMotivation for iterative methodsPart IIStationary Iterative MethodsPart IIINonstationary Iterative MethodsCS 594, 04-08-2009Part IDiscussionCS 594, 04-08-2009DiscussionPart 1:CS 594, 04-08-2009DiscussionPart 2:CS 594, 04-08-2009DiscussionOriginal matrix:>> load matrix.output>> S = spconvert(matrix);>> spy(S)Reordered matrix:>> load A.data>> S = spconvert(A);>> spy(S)CS 594, 04-08-2009DiscussionOriginal sub-matrix:>> load matrix.output>> S = spconvert(matrix);>> spy(S(1:8660),1:8660))Reordered sub-matrix:>> load A.data>> S = spconvert(A);>> spy(S(8602:8700,8602:8700))CS 594, 04-08-2009DiscussionNote:Columns have to be sortedCS 594, 04-08-2009DiscussionResults on torc0:Pentium III 933 MHz, 16 KB L1and 256 KB L2 cacheOriginal matrix:Mflop/s = 42.4568Reordered matrix:Mflop/s = 45.3954BCRSMflop/s = 72.17374Results on woodstock:Intel Xeon 5160 Woodcrest3.0GHz, L1 32 KB, L2 4 MBOriginal matrix:Mflop/s = 386.8Reordered matrix:Mflop/s = 387.6BCRSMflop/s = 894.2CS 594, 04-08-2009Homework 9CS 594, 04-08-2009Questions?Homework #9PART IPETScMany iterative solversWe will see how is projection used in iterative methodsPART IIhybrid CPU-GPUs computationsCS 594, 04-08-2009Hybrid CPU-GPU computationsCS 594, 04-08-2009Iterative MethodsCS 594, 04-08-2009MotivationSo far we have discussed and have seen:Many engineering and physics simulations lead to sparsematricese.g. PDE based simulations and their discretizations based onFEM/FVMfinite differencesPetrov-Galerkin type of conditions, etc.How to optimize performance on sparse matrix computationsSome software how to solve sparse matrix problems (e.g.PETSc)The question is:Can we solve sparse matrix problems faster than using directsparse methods?In certain cases Yes:using iterative methodsCS 594, 04-08-2009Sparse direct methodsSparse direct methodsThese are based on Gaussian elimination (LU)Performed in sparse formatAre there limitations of the approach?Yes, they have fill-ins which lead tomore memory requirementsmore flops being performedFill-ins can become prohibitively highCS 594, 04-08-2009Sparse direct methodsConsider LU for the matrix belowa nonzero is represented bya *1st step of LU factorization willintroduce fill-insmarked by an FCS 594, 04-08-2009Sparse direct methodsFill-ins can be improved by reorderingRemember: we talked about it in slightly different context(for speed and parallelization)Consider the following reorderingThese were extreme casesbut still, the problem existsCS 594, 04-08-2009Sparse methodsSparse direct vs dense methodsDense takes O(n2) storage, O(n3) flops, runs within peakperformanceSparse direct can take O(#nonz) storage, and O(#nonz) flops, butthese can also grow due to fill-ins and performance is badwith #nonz << n2and ’proper’ ordering it pays off to do sparsedirectSoftware (table from Julien Langou)http://www.netlib.org/utk/people/JackDongarra/la-sw.htmlPackage Support Type Language ModeReal Complex f77 c c++ Seq Dist SPD GenDSCPACK yes X X X M XHSL yes X X X X X XMFACT yes X X X XMUMPS yes X X X X X M X XPSPASES yes X X X M XSPARSE ? X X X X X XSPOOLES ? X X X X M XSuperLU yes X X X X X M XTAUCS yes X X X X X XUMFPACK yes X X X X XY12M ? X X X X XCS 594, 04-08-2009Sparse methodsWhat about Iterative Methods? Think for examplexi+1= xi+ P(b − Axi)Pluses:Storage is only O(#nonz) (for the matrix and a few workingvectors)Can take only a few iterations to converge (e.g. << n)for P = A−1it takes 1 iteration (check)!In general easy to parallelizeCS 594, 04-08-2009Sparse methodsWhat about Iterative Methods? Think for examplexi+1= xi+ P(b − Axi)Minuses:Performance can be bad(as we saw in Lecture 11 and today’s discussion)Convergence can be slow or even stagnateBut can be improved with preconditioningThink of P as a preconditioner, an operator/matrix P ≈ A−1Optimal preconditioners (e.g. multigrid can be) lead toconvergence in O(1) iterationsCS 594, 04-08-2009Part IIStationary Iterative MethodsCS 594, 04-08-2009Stationary Iterative MethodsCan be expressed in the formxi+1= Bxi+ cwhere B and c do not depend on iolder, simpler, easy to implement, but usually not as effective(as nonstationary)examples: Richardson, Jacobi, Gauss-Seidel, SOR, etc. (next)CS 594, 04-08-2009Richardson MethodRichardson iterationxi+1= xi+ (b − Axi) = (I − A)xi+ b (1)i.e. B from the definition above is B = I − ADenote ei= x − xiand rewrite (1)x − xi+1= x − xi− (Ax − Axi)ei+1= ei− Aei= (I − A)ei||ei+1|| ≤ ||(I − A)ei|| ≤ ||I − A|| ||ei|| ≤ ||I − A||2||ei−1||≤ · · · ≤ ||I − A||i||e0||i.e. for convergence (ei→ 0) we need||I − A|| < 1for some norm || · ||, e.g. when ρ(A) < 1.CS 594, 04-08-2009Jacobi MethodJacobi Methodxi+1= xi+ D−1(b − Axi) = (I − D−1A)xi+ D−1b (2)where D is the diagonal of A (assumed nonzero; B = I − D−1Afrom the definition)Denote ei= x − xiand rewrite (2)x − xi+1= x − xi− D−1(Ax − Axi)ei+1= ei− D−1Aei= (I − D−1A)ei||ei+1|| ≤ ||(I − D−1A)ei|| ≤ ||I − D−1A|| ||ei|| ≤ ||I − D−1A||2||ei−1||≤ · · · ≤ ||I − D−1A||i||e0||i.e. for convergence (ei→ 0) we need||I − D−1A|| < 1for some norm || · ||, e.g. when ρ(D−1A) < 1CS 594, 04-08-2009Gauss-Seidel MethodGauss-Seidel Methodxi+1= (D − L)−1(Uxi+ b) (3)where −L is the lower and −U the upper triangular part of A, andD is the diagonal.Equivalently:8>>>>>><>>>>>>:x(i+1)1= (b1− a12x(i)2− a13x(i)3− a14x(i)4− . . . − a1nx(i)n)/a11x(i+1)2= (b2− a21x(i+1)1− a23x(i)3− a24x(i)4− . . . − a2nx(i)n)/a22x(i+1)3= (b2− a31x(i+1)1− a32x(i+1)2− a34x(i)4− . . . − a3nx(i)n)/a22...x(i+1)n= (bn− an1x(i+1)1− an2x(i+1)2− an3x(i+1)3− . . . − an,n−1x(i+1)n−1)/annCS 594, 04-08-2009A comparison (from Julien Langou slides)Gauss-Seidel methodA =0BBB@10 1 2 0 10 12 1 3 11 2 9 1 00 3 1 10 01 2 0 0 151CCCAb =0BBB@23443649801CCCAkb − Ax(i)k2/kbk2, function of i, the number of iterationsx(0)x(1)x(2)x(3)x(4). . .0.0000 2.3000 0.8783 0.9790 0.9978 1.00000.0000 3.6667 2.1548 2.0107 2.0005 2.00000.0000 2.9296 3.0339 3.0055 3.0006 3.00000.0000 3.5070 3.9502 3.9962 3.9998 4.00000.0000 4.6911 4.9875 5.0000 5.0001 5.0000CS 594, 04-08-2009A comparison (from Julien Langou slides)Jacobi Method:8>>>>>>>><>>>>>>>>:x(i+1)1= (b1− a12x(i)2−
View Full Document