DOC PREVIEW
Berkeley COMPSCI C267 - Lecture Notes

This preview shows page 1-2-3-4-29-30-31-32-59-60-61-62 out of 62 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 267 Dense Linear Algebra: Parallel Gaussian EliminationOutlineSlide 3Success Stories for Sca/LAPACKMotivation (1)Motivation (2)Slide 7Gaussian Elimination (GE) for solving Ax=bRefine GE Algorithm (1)Refine GE Algorithm (2)Refine GE Algorithm (3)Refine GE Algorithm (4)Refine GE Algorithm (5)What GE really computesProblems with basic GE algorithmPivoting in Gaussian EliminationGaussian Elimination with Partial Pivoting (GEPP)Slide 18Converting BLAS2 to BLAS3 in GEPPBlocked GEPP (www.netlib.org/lapack/single/sgetrf.f)Efficiency of Blocked GEPPOverview of LAPACK and ScaLAPACKParallelizing Gaussian EliminationDifferent Data Layouts for Parallel GEReview: BLAS 3 (Blocked) GEPPSlide 26Slide 27Slide 28Slide 29Slide 30Slide 31ScaLAPACK Performance Models (1)ScaLAPACK Performance Models (2)LAPACK and ScaLAPACK ScalabilityNext release of LAPACK and ScaLAPACKExtra SlidesRecursive AlgorithmsGaussian Elimination via a Recursive AlgorithmRecursive FactorizationsSlide 40Recursive Algorithms – LimitsSlide 42Some contributors (incomplete list)Upcoming related talksSlide 45The “Holy Grail” (Parlett, Dhillon, Marques) Perfect Output complexity (O(n * #vectors)), Embarrassingly parallel, AccurateSlide 48Scalable Nonsymmetric EigensolverAssignment of parallel work in GEComputational Electromagnetics (MOM)Slide 52BLAS2 version of GE with Partial Pivoting (GEPP)Slide 54Slide 55Slide 56Slide 57Computational Chemistry – Ax = l xSlide 59Parallelism in ScaLAPACKWinner of TOPS 500 (LINPACK Benchmark)Performance of LAPACK (n=1000)Performance of LAPACK (n=100)02/23/2007 CS267 DLA2 1CS 267 Dense Linear Algebra:Parallel Gaussian EliminationJames Demmelwww.cs.berkeley.edu/~demmel/cs267_Spr0702/23/2007 CS267 DLA2 2Outline•Motivation, overview for Dense Linear Algebra•Review Gaussian Elimination (GE) for solving Ax=b•Optimizing GE for caches on sequential machines-using matrix-matrix multiplication (BLAS)•LAPACK library overview and performance•Data layouts on parallel machines•Parallel Gaussian Elimination•ScaLAPACK library overview•Eigenvalue problems•Current Research02/23/2007 CS267 DLA2 3Sca/LAPACK Overview02/23/2007 CS267 DLA2 4Success Stories for Sca/LAPACKCosmic Microwave Background Analysis, BOOMERanG collaboration, MADCAP code (Apr. 27, 2000).ScaLAPACK•Widely used-Adopted by Mathworks, Cray, Fujitsu, HP, IBM, IMSL, NAG, NEC, SGI, …->56M web hits @ Netlib (incl. CLAPACK, LAPACK95)•New Science discovered through the solution of dense matrix systems-Nature article on the flat universe used ScaLAPACK-Other articles in Physics Review B that also use it-1998 Gordon Bell Prize-www.nersc.gov/news/reports/newNERSCresults050703.pdf02/23/2007 CS267 DLA2 5Motivation (1)3 Basic Linear Algebra Problems1. Linear Equations: Solve Ax=b for x2. Least Squares: Find x that minimizes ||r||2   ri2 where r=Ax-b•Statistics: Fitting data with simple functions3a. Eigenvalues: Findand x where Ax =  x•Vibration analysis, e.g., earthquakes, circuits3b. Singular Value Decomposition: ATAx=2x•Data fitting, Information retrievalLots of variations depending on structure of A•A symmetric, positive definite, banded, …02/23/2007 CS267 DLA2 6Motivation (2) •Why dense A, as opposed to sparse A?-Many large matrices are sparse, but …-Dense algorithms easier to understand -Some applications yields large dense matrices-LINPACK Benchmark (www.top500.org)•“How fast is your computer?” = “How fast can you solve dense Ax=b?”-Large sparse matrix algorithms often yield smaller (but still large) dense problems02/23/2007 CS267 DLA2 7Current Records for Solving Dense Systems (2/18/07) GigaflopsMachine n=100 n=1000 Any n Peak IBM BlueGene/L 281K 367K (131K procs) (281 Teraflops) (n=1.8M)NEC SX 8 (8 proc, 2 GHz) 75.1 128 (1 proc, 2 GHz) 2.2 15.0 16Intel Pentium Woodcrest 3.0 6.5 12 (1 core, 3 GHz)…www.netlib.org, click on Performance Database ServerPalm Pilot III .00000169 (1.69 Kiloflops)02/23/2007 CS267 DLA2 8Gaussian Elimination (GE) for solving Ax=b•Add multiples of each row to later rows to make A upper triangular•Solve resulting triangular system Ux = c by substitution… for each column i… zero it out below the diagonal by adding multiples of row i to later rowsfor i = 1 to n-1 … for each row j below row i for j = i+1 to n … add a multiple of row i to row j tmp = A(j,i); for k = i to n A(j,k) = A(j,k) - (tmp/A(i,i)) * A(i,k)0...00...00.00000...00...00.00...00...00...0After i=1 After i=2 After i=3 After i=n-1…02/23/2007 CS267 DLA2 9Refine GE Algorithm (1)•Initial Version•Remove computation of constant tmp/A(i,i) from inner loop. … for each column i… zero it out below the diagonal by adding multiples of row i to later rowsfor i = 1 to n-1 … for each row j below row i for j = i+1 to n … add a multiple of row i to row j tmp = A(j,i); for k = i to n A(j,k) = A(j,k) - (tmp/A(i,i)) * A(i,k)for i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i to n A(j,k) = A(j,k) - m * A(i,k)m02/23/2007 CS267 DLA2 10Refine GE Algorithm (2)•Last version•Don’t compute what we already know: zeros below diagonal in column ifor i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - m * A(i,k)for i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i to n A(j,k) = A(j,k) - m * A(i,k)Do not compute zerosm02/23/2007 CS267 DLA2 11Refine GE Algorithm (3)•Last version•Store multipliers m below diagonal in zeroed entries for later usefor i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - m * A(i,k)for i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - A(j,i) * A(i,k)Store m herem02/23/2007 CS267 DLA2 12Refine GE Algorithm (4)•Last versionfor i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - A(j,i) * A(i,k)•Split Loopfor i = 1 to n-1 for j = i+1 to n


View Full Document

Berkeley COMPSCI C267 - Lecture Notes

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?