Berkeley COMPSCI C267 - Sparse Direct Solvers on High Performance Computers - D1577769

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI C267> Sparse Direct Solvers on High Performance Computers

DOC PREVIEW

Berkeley COMPSCI C267 - Sparse Direct Solvers on High Performance Computers

School name University of California, Berkeley

Course Compsci C267- Applications of Parallel Computers

Pages 40

This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Sparse Direct Solvers on High Performance ComputersReview of Gaussian Elimination (GE)Sparse GENumerical Stability: Need for PivotingDense versus Sparse GEAlgorithmic Issues in Sparse GENumerical PivotingStatic Pivoting via Weighted Bipartite MatchingNumerical Accuracy: GESP versus GEPPStructural Gaussian Elimination - Symmetric CaseMinimum Degree OrderingSlide 12Nested Dissection OrderingOrdering Based on Graph PartitioningOrdering for LU (unsymmetric)Ordering for Unsymmetric MatrixStructural Gaussian Elimination - Unsymmetric CaseResults of Markowitz OrderingTechniques to Reduce Memory Access & Communication CostBlocking in Sparse GESpeedup Over Un-blocked CodeParallel Task Scheduling for SMPs (in SuperLU_MT)Parallelism from Separator TreeMatrix Distribution on Large Distributed-memory Machine2D Block Cyclic Layout for Sparse L and U (in SuperLU_DIST)Scalability and Isoefficiency AnalysisScalabilityIrregular MatricesAdoptions of SuperLUSummaryThe EndApplication 1: Quantum MechanicsQuantum Mechanics (cont.)SuperLU_DIST as PreconditionerOne Block Timings on IBM SPApplication 2: Accelerator Cavity DesignAccelerator (cont.)Slide 38DDS47, Linear ElementsLargest Eigen Problem Solved So FarSparse Direct Solvers on High Performance ComputersSparse Direct Solvers on High Performance ComputersX. Sherry [email protected]://crd.lbl.gov/~xiaoyeCS267: Applications of Parallel ComputersMarch 2, 2005CS267: Lecture 122Review of Gaussian Elimination (GE)Review of Gaussian Elimination (GE)Solving a system of linear equations Ax = bFirst step of GE:Repeats GE on CResults in LU factorization (A = LU)L lower triangular with unit diagonal, U upper triangularThen x is obtained by solving two triangular systems with L and UCwIvBvwATT0/01TwvBCCS267: Lecture 123Sparse GESparse GESparse systems are ubiquitous in science and engineeringExample: A of dimension 105, only 10~100 nonzeros per rowGoal: Store only nonzeros and perform operations only on nonzerosFill-in: original zero entry aij becomes nonzero in L and UNatural order: nonzeros = 233 Min. Degree order: nonzeros = 207CS267: Lecture 124Numerical Stability: Need for PivotingNumerical Stability: Need for PivotingOne step of GE:  If α is small, some entries in B may be lost from additionPivoting: swap the current diagonal entry with a larger entry from the other part of the matrixGoal: control element growth in L & UCwIvBvwATT0/01TwvBCCS267: Lecture 125Dense versus Sparse GEDense versus Sparse GEDense GE: Pr A Pc = LUPr and Pc are permutations chosen to maintain stabilityPartial pivoting suffices in most cases : PrA = LUSparse GE: Pr A Pc = LUPr and Pc are chosen to maintain stability and preserve sparsityCS267: Lecture 126Algorithmic Issues in Sparse GEAlgorithmic Issues in Sparse GEMinimize number of fill-ins, maximize parallelismSparsity structure of L & U depends on that of A, which can be changed by row/column permutations (vertex re-labeling of the underlying graph)Ordering (combinatorial algorithms; NP-complete to find optimum [Yannakis ’83]; use heuristics)Predict the fill-in positions in L & USymbolic factorization (combinatorial algorithms)Design efficient data structure for storing and quick retrieval of the nonzerosCompressed storage schemesPerform factorization and triangular solutionsNumerical algorithms (F.P. operations only on nonzeros)How and when to pivot ?Usually dominate the total runtimeCS267: Lecture 127Numerical PivotingNumerical PivotingGoal of pivoting is to control element growth in L & U for stabilityFor sparse factorizations, often relax the pivoting rule to trade with better sparsity and parallelism (e.g., threshold pivoting, static pivoting , . . .)Partial pivoting used in sequential SuperLU (GEPP) Can force diagonal pivoting (controlled by diagonal threshold)Hard to implement scalably for sparse factorizationStatic pivoting used in SuperLU_DIST (GESP)Before factor, scale and permute A to maximize diagonal: Pr Dr A Dc = A’Pr is found by a weighted bipartite matching algorithm on G(A)During factor A’ = LU, replace tiny pivots by  , without changing data structures for L & UIf needed, use a few steps of iterative refinement to improve the first solution Quite stable in practiceAbs xxx x xCS267: Lecture 128Static Pivoting via Weighted Bipartite MatchingStatic Pivoting via Weighted Bipartite MatchingMaximize the diag. entries: sum, or product (sum of logs)Hungarian algo. or the like (MC64): O(n*(m+n)*log n)Auction algo. (more parallel): O(n*m*log(n*C))1 1AG(A)row column23 4523 45123541 x x 3 x 4 5CS267: Lecture 129Numerical Accuracy: GESP versus GEPPNumerical Accuracy: GESP versus GEPPCS267: Lecture 1210Structural Gaussian Elimination - Symmetric CaseStructural Gaussian Elimination - Symmetric Case 1ijkEliminate 1ikj•Undirected graph•After a vertex is eliminated, all its neighbors become a clique•The edges of the clique are the potential fills (upper bound !)ij kij kEliminate 11ijk1ijk---------CS267: Lecture 1211Minimum Degree OrderingMinimum Degree OrderingGreedy approach: do the best locallyAt each step Eliminate the vertex with the smallest degree Update degrees of the neighborsStraightforward implementation is slow and requires too much memoryNewly added edges are more than eliminated verticesCS267: Lecture 1212Minimum Degree OrderingMinimum Degree OrderingUse quotient graph as a compact representation [George/Liu ’78]Collection of cliques resulting from the eliminated vertices affects the degree of an uneliminated vertexRepresent each connected component in the eliminated subgraph by a single “supervertex”Storage required to implement QG model is bounded by size of ALarge body of literature on implementation variationsTinney/Walker `67, George/Liu `79, Liu `85, Amestoy/Davis/Duff `94, Ashcraft `95, Duff/Reid `95, et al., . . .CS267: Lecture 1213Nested Dissection OrderingNested Dissection OrderingGlobal

View Full Document

Berkeley COMPSCI C267 - Sparse Direct Solvers on High Performance Computers

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

Berkeley COMPSCI C267 - Sparse Direct Solvers on High Performance Computers

Sign up for free to view:

Please select your school