DOC PREVIEW
Berkeley COMPSCI C267 - Sparse Direct Methods on High Performance Computers

This preview shows page 1-2-3-4-26-27-28-53-54-55-56 out of 56 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Sparse Direct Methods on High Performance ComputersSparse linear solversAvailable sparse codesReview of Gaussian Elimination (GE)Sparse GEEnvelope (Profile) solverRCM orderingIs Profile Solver Good Enough?A General Data Structure: Compressed Column Storage (CCS)General Sparse SolverNumerical Stability: Need for PivotingDense versus Sparse GEAlgorithmic Issues in Sparse GEOrderingOrdering : Minimum Degree (1/3)Minimum Degree Ordering (2/3)Minimum Degree Ordering (3/3)Nested Dissection Ordering (1/3)ND Ordering (2/3)ND Ordering (3/3)Ordering for LU (unsymmetric)High Performance Issues: Reduce Cost of Memory Access & CommunicationSpeedup Over Un-blocked CodeSource of parallelism: Elimination TreeSource of parallelism: Separator TreeSource of parallelism: global partition and distributionMajor stages of sparse LUSuperLU_MT [Li/Demmel/Gilbert]SuperLU_DIST [Li/Demmel/Grigori]Multicore platformsSingle-core, single threaded BLASBenchmark matricesClovertownVictoriaFalls – multicore + multithreadLarger matricesStrong scaling: IBM Power5 (1.9 GHz)Weak scalingAnalysis of scalability and isoefficiencySummaryOpen problemsExtra SlidesStatic Pivoting via Weighted Bipartite MatchingNumerical Accuracy: GESP versus GEPPParallel Symbolic Factorization [Grigori/Demmel/Li ‘06]Application 1: Quantum MechanicsQuantum Mechanics (cont.)SuperLU_DIST as PreconditionerOne Block Timings on IBM SPApplication 2: Accelerator Cavity DesignAccelerator (cont.)Slide 51DDS47, Linear ElementsLargest Eigen Problem Solved So FarSuperfast Factorization: Exploit Low-rank PropertyResults for the Model ProblemResearch IssuesSparse Direct Methods on High Performance ComputersSparse Direct Methods on High Performance ComputersX. Sherry [email protected]://crd.lbl.gov/~xiaoyeCS267/ENgC233: Applications of Parallel ComputingApril 6, 2009CS267Sparse linear solversSparse linear solvers2Solving a system of linear equations Ax = bIterative methodsA is not changed (read-only)Key kernel: sparse matrix-vector multiplyEasier to optimize and parallelizeLow algorithmic complexity, but may not converge for hard problemsDirect methodsA is modified (factorized)Harder to optimize and parallelizeNumerically robust, but higher algorithmic complexityOften use direct method to precondition iterative methodCS267Available sparse codesAvailable sparse codesSurvey of different types of factorization codeshttp://crd.lbl.gov/~xiaoye/SuperLU/SparseDirectSurvey.pdfLLT (s.p.d.) LDLT (symmetric indefinite) LU (nonsymmetric)QR (least squares)Sequential, shared-memory (multicore), distributed-memory, out-of-coreDistributed-memory codes: usually MPI-basedSuperLU_DIST [Li/Demmel/Grigori]accessible from PETSc, TrilinosMUMPS, PasTiX, WSMP, . . .3CS2674Review of Gaussian Elimination (GE)Review of Gaussian Elimination (GE)First step of GE:Repeats GE on CResults in LU factorization (A = LU)L lower triangular with unit diagonal, U upper triangularThen, x is obtained by solving two triangular systems with L and UCwIvBvwATT0/01TwvBCCS2675Sparse GESparse GESparse matrices are ubiquitousExample: A of dimension 105, only 10~100 nonzeros per rowNonzero costs flops and memoryScalar algorithm: 3 nested loopsCan re-arrange loops to get different variants: left-looking, right-looking, . . .1234675LLUUfor i = 1 to n column_scale ( A(:,i) ) for k = i+1 to n s.t. A(i,k) != 0 for j = i+1 to n s.t. A(j,i) != 0 A(j,k) = A(j,k) - A(j,i) * A(i,k)Typical fill-ratio: 10x for 2D problems, 30-50x for 3D problemsCS267Envelope (Profile) solverEnvelope (Profile) solverDefine bandwidth for each row or columnA little more sophisticated than band solverUse Skyline storage (SKS)Lower triangle stored row by rowUpper triangle stored column by columnIn each row (column), first nonzerodefines a profileAll entries within the profile (some may be zeros) are storedAll fill-ins are confined in the profileA good ordering would be based on bandwidth reductionE.g., (reverse) Cuthill-McKee6CS267RCM orderingRCM ordering7Breadth-first search, numbering by levels, then reverseCS267Example: 3 orderings (natural, RCM, MD)Envelop size = sum of bandwidthsAfter LU, envelop would be entirely filledIs Profile Solver Good Enough?Is Profile Solver Good Enough?Env = 31775 Env = 22320Env = 61066NNZ(L, MD) = 122598CS2679A General Data Structure: Compressed Column A General Data Structure: Compressed Column Storage (CCS)Storage (CCS)Also known as Harwell-Boeing formatStore nonzeros columnwise contiguously3 arrays:Storage: NNZ reals, NNZ+N+1 integersEfficient for columnwise algorithms“Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods”, R. Barrett et al.7654321lkjihgfedcbanzval 1 c 2 d e 3 k a 4 h b f 5 i l 6 g j 7 rowind 1 3 2 3 4 3 7 1 4 6 2 4 5 6 7 6 5 6 7colptr 1 3 6 8 11 16 17 20CS267General Sparse SolverGeneral Sparse SolverUse (blocked) CRS or CCS, and any ordering methodLeave room for fill-ins ! (symbolic factorization)Exploit “supernodal” (dense) structures in the factorsCan use Level 3 BLASReduce inefficient indirect addressing (scatter/gather)Reduce graph traversal time using a coarser graph10CS26711Numerical Stability: Need for PivotingNumerical Stability: Need for PivotingOne step of GE:  If α is small, some entries in B may be lost from additionPivoting: swap the current diagonal entry with a larger entry from the other part of the matrixGoal: prevent from getting too largeCwIvBvwATT0/01TwvBCCCS26712Dense versus Sparse GEDense versus Sparse GEDense GE: Pr A Pc = LUPr and Pc are permutations chosen to maintain stabilityPartial pivoting suffices in most cases : Pr A = LUSparse GE: Pr A Pc = LUPr and Pc are chosen to maintain stability and preserve sparsityDynamic pivoting causes dynamic structural change•Alternatives: threshold pivoting, static pivoting, . . .bs xxx x xxCS26713Algorithmic Issues in Sparse GEAlgorithmic Issues in Sparse GEMinimize number of fill-ins, maximize parallelismSparsity structure of L & U depends on that


View Full Document

Berkeley COMPSCI C267 - Sparse Direct Methods on High Performance Computers

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Sparse Direct Methods on High Performance Computers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sparse Direct Methods on High Performance Computers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sparse Direct Methods on High Performance Computers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?