CS 240A: Solving Ax = b in parallelCS 240A: Solving Ax = b in parallelSlide 3MotivationReview of Gaussian Elimination (GE) for solving Ax=bRefine GE Algorithm (1)Refine GE Algorithm (2)Refine GE Algorithm (3)Refine GE Algorithm (4)Refine GE Algorithm (5)What GE really computesProblems with basic GE algorithmPivoting in Gaussian EliminationGaussian Elimination with Partial Pivoting (GEPP)Converting BLAS2 to BLAS3 in GEPPBlocked GEPP (www.netlib.org/lapack/single/sgetrf.f)Overview of LAPACKParallelizing Gaussian EliminationDifferent Data Layouts for Parallel GE (on 4 procs)Review: BLAS 3 (Blocked) GEPPSlide 21Slide 22Slide 23Slide 24Slide 25CS267 Dense Linear Algebra I.1Demmel Fa 2001CS 240A: Solving Ax = b in parallel°Dense A: Gaussian elimination with partial pivoting•Same flavor as matrix * matrix, but more complicated°Sparse A: Iterative methods – Conjugate gradient etc.•Sparse matrix times dense vector°Sparse A: Gaussian elimination – Cholesky, LU, etc.•Graph algorithms°Sparse A: Preconditioned iterative methods and multigrid•Mixture of lots of thingsCS267 Dense Linear Algebra I.2Demmel Fa 2001CS 240A: Solving Ax = b in parallel°Dense A: Gaussian elimination with partial pivoting•Same flavor as matrix * matrix, but more complicated°Sparse A: Iterative methods – Conjugate gradient etc.•Sparse matrix times dense vector°Sparse A: Gaussian elimination – Cholesky, LU, etc.•Graph algorithms°Sparse A: Preconditioned iterative methods and multigrid•Mixture of lots of thingsCS267 Dense Linear Algebra I.3Demmel Fa 2001Dense Linear Algebra (Excerpts) James Demmelhttp://www.cs.berkeley.edu/~demmel/cs267_221001.pptCS267 Dense Linear Algebra I.4Demmel Fa 2001Motivation °3 Basic Linear Algebra Problems•Linear Equations: Solve Ax=b for x•Least Squares: Find x that minimizes S ri2 where r=Ax-b•Eigenvalues: Find l and x where Ax = l x•Lots of variations depending on structure of A (eg symmetry)°Why dense A, as opposed to sparse A?•Aren’t “most” large matrices sparse?•Dense algorithms easier to understand •Some applications yields large dense matrices-Ax=b: Computational Electromagnetics-Ax = lx: Quantum Chemistry•Benchmarking-“How fast is your computer?” = “How fast can you solve dense Ax=b?”•Large sparse matrix algorithms often yield smaller (but still large) dense problemsCS267 Dense Linear Algebra I.5Demmel Fa 2001Review of Gaussian Elimination (GE) for solving Ax=b°Add multiples of each row to later rows to make A upper triangular°Solve resulting triangular system Ux = c by substitution… for each column i… zero it out below the diagonal by adding multiples of row i to later rowsfor i = 1 to n-1 … for each row j below row i for j = i+1 to n … add a multiple of row i to row j for k = i to n A(j,k) = A(j,k) - (A(j,i)/A(i,i)) * A(i,k)CS267 Dense Linear Algebra I.6Demmel Fa 2001Refine GE Algorithm (1)°Initial Version°Remove computation of constant A(j,i)/A(i,i) from inner loop… for each column i… zero it out below the diagonal by adding multiples of row i to later rowsfor i = 1 to n-1 … for each row j below row i for j = i+1 to n … add a multiple of row i to row j for k = i to n A(j,k) = A(j,k) - (A(j,i)/A(i,i)) * A(i,k)for i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i to n A(j,k) = A(j,k) - m * A(i,k)CS267 Dense Linear Algebra I.7Demmel Fa 2001Refine GE Algorithm (2)°Last version°Don’t compute what we already know: zeros below diagonal in column ifor i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - m * A(i,k)for i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i to n A(j,k) = A(j,k) - m * A(i,k)CS267 Dense Linear Algebra I.8Demmel Fa 2001Refine GE Algorithm (3)°Last version°Store multipliers m below diagonal in zeroed entries for later usefor i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - m * A(i,k)for i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - A(j,i) * A(i,k)CS267 Dense Linear Algebra I.9Demmel Fa 2001Refine GE Algorithm (4)°Last versionfor i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - A(j,i) * A(i,k)o Split Loopfor i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for j = i+1 to n for k = i+1 to n A(j,k) = A(j,k) - A(j,i) * A(i,k)CS267 Dense Linear Algebra I.10Demmel Fa 2001Refine GE Algorithm (5)°Last version°Express using matrix operations (BLAS)for i = 1 to n-1 A(i+1:n,i) = A(i+1:n,i) * ( 1 / A(i,i) ) A(i+1:n,i+1:n) = A(i+1:n , i+1:n ) - A(i+1:n , i) * A(i , i+1:n)for i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for j = i+1 to n for k = i+1 to n A(j,k) = A(j,k) - A(j,i) * A(i,k)CS267 Dense Linear Algebra I.11Demmel Fa 2001What GE really computes°Call the strictly lower triangular matrix of multipliers M, and let L = I+M°Call the upper triangle of the final matrix U°Lemma (LU Factorization): If the above algorithm terminates (does not divide by zero) then A = L*U°Solving A*x=b using GE•Factorize A = L*U using GE (cost = 2/3 n3 flops)•Solve L*y = b for y, using substitution (cost = n2 flops)•Solve U*x = y for x, using substitution (cost = n2 flops)°Thus A*x = (L*U)*x = L*(U*x) = L*y = b as desiredfor i = 1 to n-1 A(i+1:n,i) = A(i+1:n,i) / A(i,i) A(i+1:n,i+1:n) = A(i+1:n , i+1:n ) - A(i+1:n , i) * A(i , i+1:n)CS267 Dense Linear Algebra I.12Demmel Fa 2001Problems with basic GE algorithm°What if some A(i,i) is zero? Or very small?•Result may not exist, or be “unstable”, so need to pivot°Current computation all BLAS 1 or BLAS 2, but we know that BLAS 3 (matrix multiply) is fastest (earlier lectures…)for i = 1 to n-1 A(i+1:n,i) = A(i+1:n,i) / A(i,i) … BLAS 1 (scale a vector) A(i+1:n,i+1:n) = A(i+1:n , i+1:n ) … BLAS 2 (rank-1 update) - A(i+1:n , i) * A(i , i+1:n)PeakBLAS 3BLAS 2BLAS 1CS267 Dense Linear Algebra I.13Demmel Fa 2001Pivoting in Gaussian Elimination° A = [ 0 1 ] fails completely, even though A is “easy” [ 1 0 ]° Illustrate problems in 3-decimal digit arithmetic:
View Full Document