DOC PREVIEW
Berkeley COMPSCI C267 - Dense Linear Algebra: Parallel Gaussian Elimination

This preview shows page 1-2-3-4-29-30-31-32-59-60-61-62 out of 62 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 267 Dense Linear Algebra: Parallel Gaussian EliminationOutlinePowerPoint PresentationSuccess Stories for Sca/LAPACKMotivation (1)Motivation (2)Slide 7Gaussian Elimination (GE) for solving Ax=bRefine GE Algorithm (1)Refine GE Algorithm (2)Refine GE Algorithm (3)Refine GE Algorithm (4)Refine GE Algorithm (5)What GE really computesProblems with basic GE algorithmPivoting in Gaussian EliminationGaussian Elimination with Partial Pivoting (GEPP)Slide 18Converting BLAS2 to BLAS3 in GEPPBlocked GEPP (www.netlib.org/lapack/single/sgetrf.f)Efficiency of Blocked GEPPOverview of LAPACK and ScaLAPACKPerformance of LAPACK (n=1000)Performance of LAPACK (n=100)Parallelizing Gaussian EliminationDifferent Data Layouts for Parallel GEReview: BLAS 3 (Blocked) GEPPSlide 28Slide 29Slide 30Slide 31Slide 32Slide 33ScaLAPACK Performance Models (1)ScaLAPACK Performance Models (2)LAPACK and ScaLAPACK ScalabilityNext release of LAPACK and ScaLAPACKRecursive AlgorithmsGaussian Elimination via a Recursive AlgorithmRecursive FactorizationsSlide 41Recursive Algorithms – LimitsSlide 43Some contributors (incomplete list)Upcoming related talksExtra SlidesSlide 47The “Holy Grail” (Parlett, Dhillon, Marques) Perfect Output complexity (O(n * #vectors)), Embarrassingly parallel, AccurateSlide 50Scalable Nonsymmetric EigensolverAssignment of parallel work in GEComputational Electromagnetics (MOM)Slide 54BLAS2 version of GE with Partial Pivoting (GEPP)Slide 56Slide 57Slide 58Slide 59Computational Chemistry – Ax = l xSlide 61Parallelism in ScaLAPACKWinner of TOPS 500 (LINPACK Benchmark)02/14/2006 CS267 Lecture 9 1CS 267 Dense Linear Algebra:Parallel Gaussian EliminationJames Demmelwww.cs.berkeley.edu/~demmel/cs267_Spr0602/14/2006 CS267 Lecture 9 2Outline•Motivation, overview for Dense Linear Algebra•Review Gaussian Elimination (GE) for solving Ax=b•Optimizing GE for caches on sequential machines-using matrix-matrix multiplication (BLAS)•LAPACK library overview and performance•Data layouts on parallel machines•Parallel Gaussian Elimination•ScaLAPACK library overview•Eigenvalue problems•Current Research02/14/2006 CS267 Lecture 9 3Sca/LAPACK Overview02/14/2006 CS267 Lecture 9 4Success Stories for Sca/LAPACKCosmic Microwave Background Analysis, BOOMERanG collaboration, MADCAP code (Apr. 27, 2000).ScaLAPACK•Widely used-Adopted by Mathworks, Cray, Fujitsu, HP, IBM, IMSL, NAG, NEC, SGI, …->56M web hits @ Netlib (incl. CLAPACK, LAPACK95)•New Science discovered through the solution of dense matrix systems-Nature article on the flat universe used ScaLAPACK-Other articles in Physics Review B that also use it-1998 Gordon Bell Prize-www.nersc.gov/news/reports/newNERSCresults050703.pdf02/14/2006 CS267 Lecture 9 5Motivation (1)3 Basic Linear Algebra Problems1. Linear Equations: Solve Ax=b for x2. Least Squares: Find x that minimizes ||r||2   ri2 where r=Ax-b•Statistics: Fitting data with simple functions3a. Eigenvalues: Findand x where Ax =  x•Vibration analysis, e.g., earthquakes, circuits3b. Singular Value Decomposition: ATAx=2x•Data fitting, Information retrievalLots of variations depending on structure of A•A symmetric, positive definite, banded, …02/14/2006 CS267 Lecture 9 6Motivation (2) •Why dense A, as opposed to sparse A?-Many large matrices are sparse, but …-Dense algorithms easier to understand -Some applications yields large dense matrices-LINPACK Benchmark (www.top500.org)•“How fast is your computer?” = “How fast can you solve dense Ax=b?”-Large sparse matrix algorithms often yield smaller (but still large) dense problems02/14/2006 CS267 Lecture 9 7Current Records for Solving Dense Systems (2006) GigaflopsMachine n=100 n=1000 Any n Peak IBM BlueGene/L 281K 367K (131K procs) (281 Teraflops) (n=1.8M)NEC SX 8 (8 proc, 2 GHz) 75.1 128 (1 proc, 2 GHz) 2.2 15.0 16…www.netlib.org, click on Performance Database ServerPalm Pilot III .00000169 (1.69 Kiloflops)02/14/2006 CS267 Lecture 9 8Gaussian Elimination (GE) for solving Ax=b•Add multiples of each row to later rows to make A upper triangular•Solve resulting triangular system Ux = c by substitution… for each column i… zero it out below the diagonal by adding multiples of row i to later rowsfor i = 1 to n-1 … for each row j below row i for j = i+1 to n … add a multiple of row i to row j tmp = A(j,i); for k = i to n A(j,k) = A(j,k) - (tmp/A(i,i)) * A(i,k)0...00...00.00000...00...00.00...00...00...0After i=1 After i=2 After i=3 After i=n-1…02/14/2006 CS267 Lecture 9 9Refine GE Algorithm (1)•Initial Version•Remove computation of constant tmp/A(i,i) from inner loop. … for each column i… zero it out below the diagonal by adding multiples of row i to later rowsfor i = 1 to n-1 … for each row j below row i for j = i+1 to n … add a multiple of row i to row j tmp = A(j,i); for k = i to n A(j,k) = A(j,k) - (tmp/A(i,i)) * A(i,k)for i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i to n A(j,k) = A(j,k) - m * A(i,k)m02/14/2006 CS267 Lecture 9 10Refine GE Algorithm (2)•Last version•Don’t compute what we already know: zeros below diagonal in column ifor i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - m * A(i,k)for i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i to n A(j,k) = A(j,k) - m * A(i,k)Do not compute zerosm02/14/2006 CS267 Lecture 9 11Refine GE Algorithm (3)•Last version•Store multipliers m below diagonal in zeroed entries for later usefor i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - m * A(i,k)for i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - A(j,i) * A(i,k)Store m herem02/14/2006 CS267 Lecture 9 12Refine GE Algorithm (4)•Last versionfor i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - A(j,i) * A(i,k)•Split Loopfor i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for j =


View Full Document

Berkeley COMPSCI C267 - Dense Linear Algebra: Parallel Gaussian Elimination

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Dense Linear Algebra: Parallel Gaussian Elimination
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Dense Linear Algebra: Parallel Gaussian Elimination and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Dense Linear Algebra: Parallel Gaussian Elimination 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?