DOC PREVIEW
Berkeley COMPSCI C267 - Sparse Direct Methods on High Performance Computers

This preview shows page 1-2-3-24-25-26-27-49-50-51 out of 51 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Sparse Direct Methods on High Performance ComputersReview of Gaussian Elimination (GE)Sparse GECompressed Column Storage (CCS)Numerical Stability: Need for PivotingDense versus Sparse GEAlgorithmic Issues in Sparse GEOrdering : Minimum Degree (1/3)Minimum Degree Ordering (2/3)Minimum Degree Ordering (3/3)Ordering : Nested Dissection (1/3)ND Ordering (2/3)ND Ordering (3/3)Ordering for LU (unsymmetric)Ordering for LUStructural Gaussian Elimination - Unsymmetric CaseResults of Markowitz OrderingHigh Performance Issues: Reduce Cost of Memory Access & CommunicationGeneral Sparse SolverSpeedup Over Un-blocked CodeParallel Task Scheduling for SMPs (in SuperLU_MT)Parallelism from Separator TreeMatrix Distribution on Large Distributed-memory Machine2D Block Cyclic Layout for Sparse L and U (in SuperLU_DIST)Scalability and Isoefficiency AnalysisScalabilityIrregular MatricesPerformance on IBM Power5 (1.9 GHz)Performance on IBM Power3 (375 MHz)SummaryOpen ProblemsAdoptions of SuperLUExtra SlidesNumerical PivotingStatic Pivoting via Weighted Bipartite MatchingNumerical Accuracy: GESP versus GEPPBlocking in Sparse GEParallel Symbolic Factorization [Grigori/Demmel/Li ‘06]Application 1: Quantum MechanicsQuantum Mechanics (cont.)SuperLU_DIST as PreconditionerOne Block Timings on IBM SPApplication 2: Accelerator Cavity DesignAccelerator (cont.)Slide 45DDS47, Linear ElementsLargest Eigen Problem Solved So FarModel ProblemSuperfast Factorization: Exploit Low-rank PropertyResults for the Model ProblemResearch IssuesSparse Direct Methods on High Performance ComputersSparse Direct Methods on High Performance ComputersX. Sherry [email protected]://crd.lbl.gov/~xiaoyeCS267/E233: Applications of Parallel ComputingMarch 14, 2007CS2672Review of Gaussian Elimination (GE)Review of Gaussian Elimination (GE)Solving a system of linear equations Ax = bFirst step of GE:Repeats GE on CResults in LU factorization (A = LU)L lower triangular with unit diagonal, U upper triangularThen, x is obtained by solving two triangular systems with L and UCwIvBvwATT0/01TwvBCCS2673Sparse GESparse GESparse matrices are ubiquitousExample: A of dimension 105, only 10~100 nonzeros per rowGoal: Store only nonzeros and perform operations only on nonzerosFill-in: original zero entry aij becomes nonzero in L and UNatural order: nonzeros = 233 Min. Degree order: nonzeros = 207CS2674Compressed Column Storage (CCS)Compressed Column Storage (CCS)Also known as Harwell-Boeing formatStore nonzeros columnwise contiguously3 arrays:Storage: NNZ reals, NNZ+N+1 integersEfficient for columnwise algorithmsRef: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, R. Barrett et al.7654321lkjihgfedcbanzval 1 c 2 d e 3 k a 4 h b f 5 i l 6 g j 7 rowind 1 3 2 3 4 3 7 1 4 6 2 4 5 6 7 6 5 6 7colptr 1 3 6 8 11 16 17 20CS2675Numerical Stability: Need for PivotingNumerical Stability: Need for PivotingOne step of GE:  If α is small, some entries in B may be lost from additionPivoting: swap the current diagonal entry with a larger entry from the other part of the matrixGoal: prevent from getting too largeCwIvBvwATT0/01TwvBCCCS2676Dense versus Sparse GEDense versus Sparse GEDense GE: Pr A Pc = LUPr and Pc are permutations chosen to maintain stabilityPartial pivoting suffices in most cases : Pr A = LUSparse GE: Pr A Pc = LUPr and Pc are chosen to maintain stability and preserve sparsityDynamic pivoting causes dynamic structural changeAlternatives: threshold pivoting, static pivoting, . . .bs xxx x xxCS2677Algorithmic Issues in Sparse GEAlgorithmic Issues in Sparse GEMinimize number of fill-ins, maximize parallelismSparsity structure of L & U depends on that of A, which can be changed by row/column permutations (vertex re-labeling of the underlying graph)Ordering (combinatorial algorithms; NP-complete to find optimum [Yannakis ’83]; use heuristics)Predict the fill-in positions in L & USymbolic factorization (combinatorial algorithms)Perform factorization and triangular solutionsNumerical algorithms (F.P. operations only on nonzeros)How and when to pivot ?Usually dominate the total runtimeCS2678Ordering : Minimum Degree (1/3)Ordering : Minimum Degree (1/3)Local greedy: minimize upper bound on fill-inEliminate 1 1ijkEliminate 1ikjxxxxxxxxxi j k l1ijkl----------------xxxxxxxxxi j k l1ijklllCS2679Minimum Degree Ordering (2/3)Minimum Degree Ordering (2/3)Greedy approach: do the best locallyBest for modest size problemsHard to parallelizeAt each step Eliminate the vertex with the smallest degree Update degrees of the neighborsStraightforward implementation is slow and requires too much memoryNewly added edges are more than eliminated verticesCS26710Minimum Degree Ordering (3/3)Minimum Degree Ordering (3/3)Use quotient graph as a compact representation [George/Liu ’78]Collection of cliques resulting from the eliminated vertices affects the degree of an uneliminated vertexRepresent each connected component in the eliminated subgraph by a single “supervertex”Storage required to implement QG model is bounded by size of ALarge body of literature on implementation variantsTinney/Walker `67, George/Liu `79, Liu `85, Amestoy/Davis/Duff `94, Ashcraft `95, Duff/Reid `95, et al., . .CS26711Ordering : Nested Dissection (1/3)Ordering : Nested Dissection (1/3)Model problem: discretized system Ax = b from certain PDEs, e.g., 5-point stencil on n x n grid, N = n^2Factorization flops:Theorem: ND ordering gave optimal complexity in exact arithmetic [George ’73, Hoffman/Martin/Ross])()(2/33NOnO CS26712ND Ordering (2/3)ND Ordering (2/3)Generalized nested dissection [Lipton/Rose/Tarjan ’79]Global graph partitioning: top-down, divide-and-conqureBest for largest problemsParallel code available: e.g., ParMETISFirst levelRecurse on A


View Full Document

Berkeley COMPSCI C267 - Sparse Direct Methods on High Performance Computers

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Sparse Direct Methods on High Performance Computers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sparse Direct Methods on High Performance Computers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sparse Direct Methods on High Performance Computers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?