DOC PREVIEW
Berkeley COMPSCI C267 - Sparse Direct Solvers on High Performance Computers

This preview shows page 1-2-3-22-23-24-45-46-47 out of 47 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Sparse Direct Solvers on High Performance ComputersReview of Gaussian Elimination (GE)Sparse GECompressed Column Storage (CCS)Numerical Stability: Need for PivotingDense versus Sparse GEAlgorithmic Issues in Sparse GENumerical PivotingStatic Pivoting via Weighted Bipartite MatchingNumerical Accuracy: GESP versus GEPPStructural Gaussian Elimination - Symmetric CaseMinimum Degree Ordering (1/2)Minimum Degree Ordering (2/2)Nested Dissection Ordering (1/2)Nested Dissection Ordering (2/2)Ordering for LU (unsymmetric)Ordering for LUStructural Gaussian Elimination - Unsymmetric CaseResults of Markowitz OrderingHigh Performance Issues: Reduce Cost of Memory Access & CommunicationBlocking in Sparse GESpeedup Over Un-blocked CodeParallel Task Scheduling for SMPs (in SuperLU_MT)Parallelism from Separator TreeParallel Symbolic Factorization [Grigori/Demmel/Li ‘06]Matrix Distribution on Large Distributed-memory Machine2D Block Cyclic Layout for Sparse L and U (in SuperLU_DIST)Scalability and Isoefficiency AnalysisScalabilityIrregular MatricesAdoptions of SuperLUSummaryOpen ProblemsModel ProblemSuperfast Factorization: Exploit Low-rank PropertyResults for the Model ProblemResearch IssuesThe EndApplication 1: Quantum MechanicsQuantum Mechanics (cont.)SuperLU_DIST as PreconditionerOne Block Timings on IBM SPApplication 2: Accelerator Cavity DesignAccelerator (cont.)Slide 45DDS47, Linear ElementsLargest Eigen Problem Solved So FarSparse Direct Solvers on High Performance ComputersSparse Direct Solvers on High Performance ComputersX. Sherry [email protected]://crd.lbl.gov/~xiaoyeCS267: Applications of Parallel ComputersApril 25, 2006CS2672Review of Gaussian Elimination (GE)Review of Gaussian Elimination (GE)Solving a system of linear equations Ax = bFirst step of GE:Repeats GE on CResults in LU factorization (A = LU)L lower triangular with unit diagonal, U upper triangularThen, x is obtained by solving two triangular systems with L and UCwIvBvwATT0/01TwvBCCS2673Sparse GESparse GESparse systems are ubiquitousExample: A of dimension 105, only 10~100 nonzeros per rowGoal: Store only nonzeros and perform operations only on nonzerosFill-in: original zero entry aij becomes nonzero in L and UNatural order: nonzeros = 233 Min. Degree order: nonzeros = 207CS2674Compressed Column Storage (CCS)Compressed Column Storage (CCS)Also known as Harwell-Boeing formatStore nonzeros columnwise contiguously3 arrays:Storage: NNZ reals, NNZ+N+1 integersEfficient for columnwise algorithmsRef: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, R. Barrett et al.7654321lkjihgfedcbanzval 1 c 2 d e 3 k a 4 h b f 5 i l 6 g j 7 rowind 1 3 2 3 4 3 7 1 4 6 2 4 5 6 7 6 5 6 7colptr 1 3 6 8 11 16 17 20CS2675Numerical Stability: Need for PivotingNumerical Stability: Need for PivotingOne step of GE:  If α is small, some entries in B may be lost from additionPivoting: swap the current diagonal entry with a larger entry from the other part of the matrixGoal: prevent from getting too largeCwIvBvwATT0/01TwvBCCCS2676Dense versus Sparse GEDense versus Sparse GEDense GE: Pr A Pc = LUPr and Pc are permutations chosen to maintain stabilityPartial pivoting suffices in most cases : PrA = LUSparse GE: Pr A Pc = LUPr and Pc are chosen to maintain stability and preserve sparsityCS2677Algorithmic Issues in Sparse GEAlgorithmic Issues in Sparse GEMinimize number of fill-ins, maximize parallelismSparsity structure of L & U depends on that of A, which can be changed by row/column permutations (vertex re-labeling of the underlying graph)Ordering (combinatorial algorithms; NP-complete to find optimum [Yannakis ’83]; use heuristics)Predict the fill-in positions in L & USymbolic factorization (combinatorial algorithms)Perform factorization and triangular solutionsNumerical algorithms (F.P. operations only on nonzeros)How and when to pivot ?Usually dominate the total runtimeCS2678Numerical PivotingNumerical PivotingGoal of pivoting is to control element growth in L & U for stabilityFor sparse factorizations, often relax the pivoting rule to trade with better sparsity and parallelism (e.g., threshold pivoting, static pivoting)Partial pivoting used in sequential SuperLU (GEPP) Can force diagonal pivoting (use diagonal threshold)Hard to implement scalably for sparse factorizationStatic pivoting used in SuperLU_DIST (GESP)Before factor, scale and permute A to maximize diagonal: Pr Dr A Dc = A’Pr is found by a weighted bipartite matching algorithm on G(A)During factor A’ = LU, replace tiny pivots by  , without changing data structures for L & UIf needed, use a few steps of iterative refinement to improve the first solution Quite stable in practiceAbs xxx x xCS2679Static Pivoting via Weighted Bipartite MatchingStatic Pivoting via Weighted Bipartite MatchingMaximize the diag. entries: sum, or product (sum of logs)Hungarian algo. or the like (MC64): O(n*(m+n)*log n)Auction algo. (more parallel): O(n*m*log(n*C))J. Riedy’s dissertation (expected Dec. 2006?)1 1AG(A)row column23 4523 45123541 x x 3 x 4 5CS26710Numerical Accuracy: GESP versus GEPPNumerical Accuracy: GESP versus GEPPCS26711Structural Gaussian Elimination - Symmetric CaseStructural Gaussian Elimination - Symmetric Case 1ijkEliminate 1ikj•Undirected graph•After a vertex is eliminated, all its neighbors become a clique•The edges of the clique are the potential fills (upper bound !)ij kij kEliminate 11ijk1ijk---------CS26712Minimum Degree Ordering (1/2)Minimum Degree Ordering (1/2)Greedy approach: do the best locallyBest for modest size problemsHard to parallelizeAt each step Eliminate the vertex with the smallest degree Update degrees of the neighborsStraightforward implementation is slow and requires too much memoryNewly


View Full Document

Berkeley COMPSCI C267 - Sparse Direct Solvers on High Performance Computers

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Sparse Direct Solvers on High Performance Computers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sparse Direct Solvers on High Performance Computers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sparse Direct Solvers on High Performance Computers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?