Sparse Direct Solvers on High Performance ComputersReview of Gaussian Elimination (GE)Sparse GENumerical Stability: Need for PivotingDense versus Sparse GEAlgorithmic Issues in Sparse GENumerical PivotingStatic Pivoting via Weighted Bipartite MatchingNumerical Accuracy: GESP versus GEPPStructural Gaussian Elimination - Symmetric CaseMinimum Degree OrderingSlide 12Nested Dissection OrderingOrdering Based on Graph PartitioningOrdering for LU (unsymmetric)Ordering for Unsymmetric MatrixStructural Gaussian Elimination - Unsymmetric CaseResults of Markowitz OrderingTechniques to Reduce Memory Access & Communication CostBlocking in Sparse GESpeedup Over Un-blocked CodeParallel Task Scheduling for SMPs (in SuperLU_MT)Parallelism from Separator TreeMatrix Distribution on Large Distributed-memory Machine2D Block Cyclic Layout for Sparse L and U (in SuperLU_DIST)Scalability and Isoefficiency AnalysisScalabilityIrregular MatricesAdoptions of SuperLUSummaryThe EndApplication 1: Quantum MechanicsQuantum Mechanics (cont.)SuperLU_DIST as PreconditionerOne Block Timings on IBM SPApplication 2: Accelerator Cavity DesignAccelerator (cont.)Slide 38DDS47, Linear ElementsLargest Eigen Problem Solved So FarSparse Direct Solvers on High Performance ComputersSparse Direct Solvers on High Performance ComputersX. Sherry [email protected]://crd.lbl.gov/~xiaoyeCS267: Applications of Parallel ComputersMarch 2, 2005CS267: Lecture 122Review of Gaussian Elimination (GE)Review of Gaussian Elimination (GE)Solving a system of linear equations Ax = bFirst step of GE:Repeats GE on CResults in LU factorization (A = LU)L lower triangular with unit diagonal, U upper triangularThen x is obtained by solving two triangular systems with L and UCwIvBvwATT0/01TwvBCCS267: Lecture 123Sparse GESparse GESparse systems are ubiquitous in science and engineeringExample: A of dimension 105, only 10~100 nonzeros per rowGoal: Store only nonzeros and perform operations only on nonzerosFill-in: original zero entry aij becomes nonzero in L and UNatural order: nonzeros = 233 Min. Degree order: nonzeros = 207CS267: Lecture 124Numerical Stability: Need for PivotingNumerical Stability: Need for PivotingOne step of GE: If α is small, some entries in B may be lost from additionPivoting: swap the current diagonal entry with a larger entry from the other part of the matrixGoal: control element growth in L & UCwIvBvwATT0/01TwvBCCS267: Lecture 125Dense versus Sparse GEDense versus Sparse GEDense GE: Pr A Pc = LUPr and Pc are permutations chosen to maintain stabilityPartial pivoting suffices in most cases : PrA = LUSparse GE: Pr A Pc = LUPr and Pc are chosen to maintain stability and preserve sparsityCS267: Lecture 126Algorithmic Issues in Sparse GEAlgorithmic Issues in Sparse GEMinimize number of fill-ins, maximize parallelismSparsity structure of L & U depends on that of A, which can be changed by row/column permutations (vertex re-labeling of the underlying graph)Ordering (combinatorial algorithms; NP-complete to find optimum [Yannakis ’83]; use heuristics)Predict the fill-in positions in L & USymbolic factorization (combinatorial algorithms)Design efficient data structure for storing and quick retrieval of the nonzerosCompressed storage schemesPerform factorization and triangular solutionsNumerical algorithms (F.P. operations only on nonzeros)How and when to pivot ?Usually dominate the total runtimeCS267: Lecture 127Numerical PivotingNumerical PivotingGoal of pivoting is to control element growth in L & U for stabilityFor sparse factorizations, often relax the pivoting rule to trade with better sparsity and parallelism (e.g., threshold pivoting, static pivoting , . . .)Partial pivoting used in sequential SuperLU (GEPP) Can force diagonal pivoting (controlled by diagonal threshold)Hard to implement scalably for sparse factorizationStatic pivoting used in SuperLU_DIST (GESP)Before factor, scale and permute A to maximize diagonal: Pr Dr A Dc = A’Pr is found by a weighted bipartite matching algorithm on G(A)During factor A’ = LU, replace tiny pivots by , without changing data structures for L & UIf needed, use a few steps of iterative refinement to improve the first solution Quite stable in practiceAbs xxx x xCS267: Lecture 128Static Pivoting via Weighted Bipartite MatchingStatic Pivoting via Weighted Bipartite MatchingMaximize the diag. entries: sum, or product (sum of logs)Hungarian algo. or the like (MC64): O(n*(m+n)*log n)Auction algo. (more parallel): O(n*m*log(n*C))1 1AG(A)row column23 4523 45123541 x x 3 x 4 5CS267: Lecture 129Numerical Accuracy: GESP versus GEPPNumerical Accuracy: GESP versus GEPPCS267: Lecture 1210Structural Gaussian Elimination - Symmetric CaseStructural Gaussian Elimination - Symmetric Case 1ijkEliminate 1ikj•Undirected graph•After a vertex is eliminated, all its neighbors become a clique•The edges of the clique are the potential fills (upper bound !)ij kij kEliminate 11ijk1ijk---------CS267: Lecture 1211Minimum Degree OrderingMinimum Degree OrderingGreedy approach: do the best locallyAt each step Eliminate the vertex with the smallest degree Update degrees of the neighborsStraightforward implementation is slow and requires too much memoryNewly added edges are more than eliminated verticesCS267: Lecture 1212Minimum Degree OrderingMinimum Degree OrderingUse quotient graph as a compact representation [George/Liu ’78]Collection of cliques resulting from the eliminated vertices affects the degree of an uneliminated vertexRepresent each connected component in the eliminated subgraph by a single “supervertex”Storage required to implement QG model is bounded by size of ALarge body of literature on implementation variationsTinney/Walker `67, George/Liu `79, Liu `85, Amestoy/Davis/Duff `94, Ashcraft `95, Duff/Reid `95, et al., . . .CS267: Lecture 1213Nested Dissection OrderingNested Dissection OrderingGlobal
View Full Document