DOC PREVIEW
UTK CS 594 - Sparse Matrices and Optimized Parallel Implementations

This preview shows page 1-2-15-16-31-32 out of 32 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 1 / 32Sparse Matrices and OptimizedParallel Implementations________________________________________________CS 594 Lecture Notes03/25/2009Stan TomovInnovative Computing LaboratoryComputer Science DepartmentThe University of TennesseeMarch 25, 2009Slide 2 / 32Topics Projection inScientific Computing PDEs, Numerical solution, Tools, etc. Sparse matrices, parallel implementations Iterative MethodsSlide 3 / 32Outline•Part I–Discussion •Part II–Sparse matrix computations•Part III–Reordering algorithms and parallelizationSlide 4 / 32Part IDiscussionSlide 5 / 32Orthogonalization•We can orthonormalize non-orthogonal basis. How?Other approaches: QR using Householder transformation (as in LAPACK), Cholesky, or/and SVD on normal equations (as in homeworks 7 and 8) 1 2 4 8 16 32 64 128 256 51200.511.522.5MGSQR_it_svdQR_it_Chol/svdVector size x 1,000Gflop/s1 2 4 8 16 32 64 128 256 512020406080100120140QR_it_svdQR_it_Chol/svdVector size x 1,000Gflop/sHybrid CPU-GPU (NVIDIA Quadro FX 5600) computation as in Homework #9 CPU computationAMD Opteron (tm), Processor 265 (1.8 Ghz, 1 GB cache)128 vectorsSlide 6 / 32What if the basis is not orthonormal?•If we do not want to orthonormalize: u  P u = c1 x1 + c2 x2 + . . . + cm xm (u , x1) = c1 (x1 , x1)+ c2 (x2 , x1)+ . . . + cm (xm , x1) . . . (u, xm) = c1 (x1 , xm)+ c2 (x2 , xm)+ . . . + cm (xm , xm)• These are the so called Petrov-Galerkin conditions• We saw examples of their use in * optimization, and * PDE discretization, e.g. FEM / 'Multiply' by x1, ..., xm to getSlide 7 / 32What if the basis is not orthonormal?•If we do not want to orthonormalize, e.g. in FEM u  P u = c1 1 + c2 2 + . . . + c7 7 a( c1 1 + c2 2 + . . . + c7 7 , i) = F( i) for i = 1, ... , 7 / 'Multiply' by 1, ..., 7 to get a 7x7 system(Image taken from http://www.urel.feec.vutbr.cz/~raida) Two examples of basis functions i The more i overlap, the denser the resulting matrix Spectral element methods (high-order FEM)Slide 8 / 32Stencil Computations•K. Datta, S. Kamil, S. Williams, L. Oliker, J. Shaft, K. Yelick, “Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors”, SIAM Review, 2008.http://bebop.cs.berkeley.edu/pubs/datta2008-stencil-sirev.pdfSlide 9 / 32Part IISparse matrix computationsSlide 10 / 32Sparse matrices•Sparse matrix: substantial part of the coefficients is zero•Naturally arise from PDE discretizations–finite differences, FEM, etc; we saw examples in the 5-point finite difference operator 1-D piece-wise linear FEM Vertices are indexed, eg. 105 6 7 2Row 6 will have 5 nonzero elements: Row 3, for example, will have 3 nonzerosA6,2, A6,5, A6,6, A6,7, and A6,10 A3,2, A3,3, A3,4Slide 11 / 32Sparse matrices•In general :* Degrees of freedom (DOF), associated for ex. with vertices (or edges, faces, etc.), are indexed* A basis function is associated with every DOF (unknown)* A Petrov-Galerkin condition (equation) is derived for every basis function, representing a row in the resulting system* Only 'a few' elements per row will be nonzero as the basis functions have local support - eg. row 10, using continuous piecewise linear FEM, will have 6 nonzeroes: A10,10, A10,35, A10,100, A10,332, A10,115,A10,201 - physical intuition behind: PDEs describe changes in physical processes; describing/discretizing these changes numerically, based only on local/neighboring information, results in sparse matrices 1010011520135332eg. what happens at '10' is described bythe physical state at '10' and the neighboring 35, 201, 115, 100, and 332.Slide 12 / 32Sparse matrices•Can we take advantage of this sparse structure?–To solve for example very large problems–To solve them efficiently•Yes! There are algorithms–Linear solvers and preconditioners (to cover some in the last 2 lectures)–Efficient data storage and implementation (next ...)Slide 13 / 32Sparse matrix formats•It pays to avoid storing the zeros!•Common sparse storage formats:–AIJ–Compressed row/column storage (CRS/CCS)–Compressed diagonal storage (CDS)* for more see the 'Templates' book http://www.netlib.org/linalg/html_templates/node90.html#SECTION00931000000000000000 –Blocked versions (why?)Slide 14 / 32AIJ•Stored in 3 arrays–The same length–No order implied I J AIJ 1 1 1 1 2 2 2 1 3 2 3 4 3 2 5 3 4 6 4 3 7 4 5 8Slide 15 / 32CRS•Stored in 3 arrays–J and AIJ the same length–I (representing rows) is compressed I J AIJ 1 1 1 3 2 2 5 1 3 7 3 4 9 2 5 4 6 3 7 5 8array I : think of it as pointers to where next row starts CCS : similar but J is compressedSlide 16 / 32CDS•For matrices with non-zeros along sub-diogonals-1 0 1Sub-diagonal indexsubdiogonalsSlide 17 / 32Performance (Mat-vec product)•Notoriously bad for running at just a fraction of the performance peak!•Why ?Consider Mat-vec product for matrix in CRS: for i = 1, n for j = I[i], I[i+1]-1 y[i] += AIJ [j] * x [ J[j] ]Slide 18 / 32Performance (Mat-vec product)•Notoriously bad for running at just a fraction of the performance peak!•Why ?Consider Mat-vec product for matrix in CRS: for i = 1, n for j = I[i], I[i+1]-1 y[i] += AIJ [j] * x [ J[j] ]* Irregular indirect memory access for x - result in cache trashing* performance often <10% peak10100 11520135332Slide 19 / 32Performance (Mat-vec product)* Performance of


View Full Document

UTK CS 594 - Sparse Matrices and Optimized Parallel Implementations

Documents in this Course
Load more
Download Sparse Matrices and Optimized Parallel Implementations
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sparse Matrices and Optimized Parallel Implementations and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sparse Matrices and Optimized Parallel Implementations 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?