Berkeley COMPSCI C267 - Sparse Direct Methods on High Performance Computers - D95350

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI C267> Sparse Direct Methods on High Performance Computers

DOC PREVIEW

Berkeley COMPSCI C267 - Sparse Direct Methods on High Performance Computers

School name University of California, Berkeley

Course Compsci C267- Applications of Parallel Computers

Pages 51

This preview shows page 1-2-3-24-25-26-27-49-50-51 out of 51 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Sparse Direct Methods on High Performance ComputersReview of Gaussian Elimination (GE)Sparse GECompressed Column Storage (CCS)Numerical Stability: Need for PivotingDense versus Sparse GEAlgorithmic Issues in Sparse GEOrdering : Minimum Degree (1/3)Minimum Degree Ordering (2/3)Minimum Degree Ordering (3/3)Ordering : Nested Dissection (1/3)ND Ordering (2/3)ND Ordering (3/3)Ordering for LU (unsymmetric)Ordering for LUStructural Gaussian Elimination - Unsymmetric CaseResults of Markowitz OrderingHigh Performance Issues: Reduce Cost of Memory Access & CommunicationGeneral Sparse SolverSpeedup Over Un-blocked CodeParallel Task Scheduling for SMPs (in SuperLU_MT)Parallelism from Separator TreeMatrix Distribution on Large Distributed-memory Machine2D Block Cyclic Layout for Sparse L and U (in SuperLU_DIST)Scalability and Isoefficiency AnalysisScalabilityIrregular MatricesPerformance on IBM Power5 (1.9 GHz)Performance on IBM Power3 (375 MHz)SummaryOpen ProblemsAdoptions of SuperLUExtra SlidesNumerical PivotingStatic Pivoting via Weighted Bipartite MatchingNumerical Accuracy: GESP versus GEPPBlocking in Sparse GEParallel Symbolic Factorization [Grigori/Demmel/Li ‘06]Application 1: Quantum MechanicsQuantum Mechanics (cont.)SuperLU_DIST as PreconditionerOne Block Timings on IBM SPApplication 2: Accelerator Cavity DesignAccelerator (cont.)Slide 45DDS47, Linear ElementsLargest Eigen Problem Solved So FarModel ProblemSuperfast Factorization: Exploit Low-rank PropertyResults for the Model ProblemResearch IssuesSparse Direct Methods on High Performance ComputersSparse Direct Methods on High Performance ComputersX. Sherry [email protected]://crd.lbl.gov/~xiaoyeCS267/E233: Applications of Parallel ComputingMarch 14, 2007CS2672Review of Gaussian Elimination (GE)Review of Gaussian Elimination (GE)Solving a system of linear equations Ax = bFirst step of GE:Repeats GE on CResults in LU factorization (A = LU)L lower triangular with unit diagonal, U upper triangularThen, x is obtained by solving two triangular systems with L and UCwIvBvwATT0/01TwvBCCS2673Sparse GESparse GESparse matrices are ubiquitousExample: A of dimension 105, only 10~100 nonzeros per rowGoal: Store only nonzeros and perform operations only on nonzerosFill-in: original zero entry aij becomes nonzero in L and UNatural order: nonzeros = 233 Min. Degree order: nonzeros = 207CS2674Compressed Column Storage (CCS)Compressed Column Storage (CCS)Also known as Harwell-Boeing formatStore nonzeros columnwise contiguously3 arrays:Storage: NNZ reals, NNZ+N+1 integersEfficient for columnwise algorithmsRef: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, R. Barrett et al.7654321lkjihgfedcbanzval 1 c 2 d e 3 k a 4 h b f 5 i l 6 g j 7 rowind 1 3 2 3 4 3 7 1 4 6 2 4 5 6 7 6 5 6 7colptr 1 3 6 8 11 16 17 20CS2675Numerical Stability: Need for PivotingNumerical Stability: Need for PivotingOne step of GE:  If α is small, some entries in B may be lost from additionPivoting: swap the current diagonal entry with a larger entry from the other part of the matrixGoal: prevent from getting too largeCwIvBvwATT0/01TwvBCCCS2676Dense versus Sparse GEDense versus Sparse GEDense GE: Pr A Pc = LUPr and Pc are permutations chosen to maintain stabilityPartial pivoting suffices in most cases : Pr A = LUSparse GE: Pr A Pc = LUPr and Pc are chosen to maintain stability and preserve sparsityDynamic pivoting causes dynamic structural changeAlternatives: threshold pivoting, static pivoting, . . .bs xxx x xxCS2677Algorithmic Issues in Sparse GEAlgorithmic Issues in Sparse GEMinimize number of fill-ins, maximize parallelismSparsity structure of L & U depends on that of A, which can be changed by row/column permutations (vertex re-labeling of the underlying graph)Ordering (combinatorial algorithms; NP-complete to find optimum [Yannakis ’83]; use heuristics)Predict the fill-in positions in L & USymbolic factorization (combinatorial algorithms)Perform factorization and triangular solutionsNumerical algorithms (F.P. operations only on nonzeros)How and when to pivot ?Usually dominate the total runtimeCS2678Ordering : Minimum Degree (1/3)Ordering : Minimum Degree (1/3)Local greedy: minimize upper bound on fill-inEliminate 1 1ijkEliminate 1ikjxxxxxxxxxi j k l1ijkl----------------xxxxxxxxxi j k l1ijklllCS2679Minimum Degree Ordering (2/3)Minimum Degree Ordering (2/3)Greedy approach: do the best locallyBest for modest size problemsHard to parallelizeAt each step Eliminate the vertex with the smallest degree Update degrees of the neighborsStraightforward implementation is slow and requires too much memoryNewly added edges are more than eliminated verticesCS26710Minimum Degree Ordering (3/3)Minimum Degree Ordering (3/3)Use quotient graph as a compact representation [George/Liu ’78]Collection of cliques resulting from the eliminated vertices affects the degree of an uneliminated vertexRepresent each connected component in the eliminated subgraph by a single “supervertex”Storage required to implement QG model is bounded by size of ALarge body of literature on implementation variantsTinney/Walker `67, George/Liu `79, Liu `85, Amestoy/Davis/Duff `94, Ashcraft `95, Duff/Reid `95, et al., . .CS26711Ordering : Nested Dissection (1/3)Ordering : Nested Dissection (1/3)Model problem: discretized system Ax = b from certain PDEs, e.g., 5-point stencil on n x n grid, N = n^2Factorization flops:Theorem: ND ordering gave optimal complexity in exact arithmetic [George ’73, Hoffman/Martin/Ross])()(2/33NOnO CS26712ND Ordering (2/3)ND Ordering (2/3)Generalized nested dissection [Lipton/Rose/Tarjan ’79]Global graph partitioning: top-down, divide-and-conqureBest for largest problemsParallel code available: e.g., ParMETISFirst levelRecurse on A

View Full Document

Berkeley COMPSCI C267 - Sparse Direct Methods on High Performance Computers

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-24-25-26-27-49-50-51 out of 51 pages.

Berkeley COMPSCI C267 - Sparse Direct Methods on High Performance Computers

Sign up for free to view:

Please select your school