✬✫✩✪Sparse Compilation1✬✫✩✪Motivation for Sparse Codes• Consider ¤ux, heat, or stresses – interactions are between neighbors.• Linear equations are sparse. Therefore, matrices are sparse.2✬✫✩✪ahcbe fgd1421334 2423133 41A.columnA.rowA.valThree Sparse Matrix Representationsabcdefgh33413241a1ehgfcdba13421234AA.valA.columnA.rowptrA.valA.rowA.colptrCRSCCSCo-ordinateStorageecbfgdh3213424Indexed access to a rowIndexed access to a columnIndexed access to neither rows nor columns3✬✫✩✪Jagged Diagonal (JDiag)12341234abcdefgh→12342134cdeabfgh→cdeabfghpermadiagptracolindavaluescafh2114dbg322e415893421• Long “vectors”• Direction of access is not row or column4✬✫✩✪BlockSolve (BS)InodesColorsCliques• Dense submatrices• Colors → cliques → inodes.• Composition of two storage formats.5✬✫✩✪The effect of sparse storageName N Nzs Diag. Coor. CRS JDiag B.S.nos6 675 3255 38.658 5.441 20.634 32.945 2.5702 × 25 × 1 625 3025 37.907 5.650 21.416 32.952 2.593nos7 729 4617 35.749 4.836 20.000 27.830 3.2592 × 10 × 3 300 4140 27.006 9.359 29.881 33.727 17.457medium 181 2245 23.192 7.888 29.874 32.583 19.633bcsstm27 1224 56k 15.130 4.807 23.677 21.604 28.907e05r0000 236 5856 8.534 4.841 26.642 25.085 SEGV3 × 17 × 7 34.4k 1.6M 8.478 4.752 23.499 11.805 27.6156✬✫✩✪NIST Sparse BLAS• Algorithms1. Matrix-matrix products (MM),C ← αAB + βC C ← αATB + βC,where A is sparse, B and C are dense, and α and β are scalars.2. Triangular solves,C ← αDA−1B + βC C ← αDA−TB + βCC ← αA−1DB + βC C ← αA−TDB + βCwhere D is a “(block) diagonal” matrix.3. Right permutation of a sparse matrix in Jagged Diagonal format,A → AP A → APT4. Integrity check of sparse A.7✬✫✩✪NIST Sparse BLAS (cont).• Storage formats• Point entry – each entry of the stor-age format is a single matrix ele-ment.• Coordinate• CCS• CRS• Sparse diagonal• ITPACK/ELLPACK• Jagged diagonal• Skyline• Block entry – each “entry” is adense block of matrix elements.• Block coordinate• Block CCS• Block CRS• Block sparse diagonal• Block ITPACK/ELLPACK• Variable Block compressedRow storage (VBR)8✬✫✩✪NIST Sparse BLAS (cont).• Limitations• Huge number of routines to implement.User-level Only 4 routines.Toolkit-level 52 (= 4 routines * 13 formats) routines.Lite-level 2,964 (= 228 routines * 13 formats) routines.• Algorithms are not complete.E.g., Matrix assembly, Incomplete and complete factorizations.• Data structures are not complete.E.g., BlockSolve• Only one operand can be sparse.No sparseC = A ∗ B.9✬✫✩✪A Sparse Compiler• Still need to develop sparse applications.• Want to automate the task.SparseDenseSpecification ImplementationSparsity InformationSparseCompilerDesign goals:• Programmer selects the sparse storage formats.• Programmer can specify novel storage formats.• Sparse implementations as ef£cient as possible.10✬✫✩✪Challenges for sparse compilation• Describing sparse matrix formats to the compiler.• Transforming loops for ef£cient access of sparse matrices.• Dealing with redundent dimensions.• Accessing only Non-zeros.11✬✫✩✪Describing Storage Formats – Random Access• A[i, j].• Sparse matrices as objects with get and set methods.• Dependencies are preserved.for i = 1, nfor j = 1, ny[i] += A.get(i,j) * x[j]• inef£cient• Searching is inef£cient.• Useless compuation when A[i, j]=0.12✬✫✩✪Describing Storage Formats – Sequential Access• Stream of non-zeros, <i,j,v >.• Sparse matrices as containers with iterations.for <i,j,v> in A.nzs()y[i] += v * x[j]• What about dependencies? Must know order of iteration.• Simultaneous enumeration...13✬✫✩✪Describing Storage Formats – Sequential Access (cont.)• Consider C = A ∗ B, where A and B are sparse.• With sequential access,for <i,k,Av> in A.nzs()for <k’,j,Bv> in B.nzs()if k = k’ thenC[i,j] += Av * Bv• a better solution,for <i,k,Av> in A.nzs()for <j,Bv> in B[k,:].nzs()C[i,j] += Av * Bv• CRS gives us this type of access.14✬✫✩✪Indexed-sequential access• Storage formats have heirarchical structure.• Algebraic description of this structure.• Nestingc →→→→ r →→→→ v• Aggregation(r →→→→ c →→→→ v) ∪∪∪∪ (c →→→→ r →→→→ v)• Linear Maps map{br*B+or ↔↔↔↔ r, bc*B+oc ↔↔↔↔ c:bc →→→→ br →→→→ <or oc>→→→→ v}• Perspective(r →→→→ c →→→→ v) ⊕⊕⊕⊕ (c →→→→ r →→→→ v)13451321a e b f x… h… c…Block Sparse Column1579111 234341234aceh f ibdg jCompressed Column StorageCRSCCS+15✬✫✩✪Conveying the structure to the compiler• Annotations!$SPARSE CRS: r -> c -> vreal A(0:99,0:99)• Each production implemented as an abstract interface classclass CRS : public Indexed<int,Indexed<int,Value<double> > >16✬✫✩✪Challenges for sparse compilation√Describing sparse matrix formats to the compiler.r → c → v• Transforming loops for ef£cient access of sparse matrices.• Dealing with redundent dimensions.• Accessing only Non-zeros.17✬✫✩✪Loop transformation framework• Extend our framework for imperfectly nested loop• For each statement – Statement space = (Iteration space, Data space)for i =for j = ...S1: ... A[F1(i,j),F2(i,j)]+ B[G(i,j)] ...• Iteration space - as before• Data space - Product of sparse array dimensions,S1:< i,j,a1,a2,b>• Product space - Cartesian product of statement spaces18✬✫✩✪Loop transformation framework (cont.)• Finding embedding functions• Add constraints for array refs a = Fi.• Use Farkas Lemma, as before.• Account for the structure of sparse matrices,• map – change of variables, P= TP• perspective – choice• aggregation – make two copies of indices, a → a,a• Transformations - not tiling, instead Data-centric• order the array indices, a1,a2,b1,b2,...• Partial transformation, bring array indices outermost• Complete transformation.19✬✫✩✪Challenges for sparse compilation√Describing sparse matrix formats to the compiler.r → c → v√Transforming loops for ef£cient access of sparse matrices.• Augmented product space.• Data-centric transformations.• Dealing with redundent dimensions.• Accessing only Non-zeros.20✬✫✩✪Redundent dimensions• Dot product of
View Full Document