DOC PREVIEW
The Matrix Template Library

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The Matrix Template Library A Generic Programming Approach to High Performance Numerical Linear Algebra Jeremy G Siek Andrew Lumsdaine Laboratory for Scientific Computing Department of Computer Science and Engineering University of Notre Dame Abstract We present a unified approach for expressing high performance numerical linear algebra routines for large classes of dense and sparse matrices As with the Standard Template Library 10 we explicitly separate algorithms from data structures through the use of generic programming techniques We conclude that such an approach does not hinder high performance On the contrary writing portable high performance codes is actually enabled with such an approach because the performance critical code sections can be isolated from the algorithms and the data structures We also tackle the performance portability problem for particular architecture dependent algorithms such as matrix matrix multiply Recently code generation systems PHiPAC 3 and ATLAS 15 have been created to customize the algorithms according to architecture A more elegant approach is to use template metaprograms 18 to allow for variation In this paper we introduce the Basic Linear Algebra Instruction Set BLAIS a collection of high performance kernels for basic linear algebra 1 Introduction The traditional approach to writing basic linear algebra routines is a combinatorial affair There are typically four precision types that need to be handled single and double precision real single and double precision complex several dense storage types general banded packed a multitude of sparse storage types 13 in the Sparse BLAS Standard Proposal 1 as well as row and column orientations for each matrix type To provide a full implementation one might need to code literally hundreds of versions of the same routine It is no wonder the NIST implementation of the Sparse BLAS contains over 10 000 routines and an automatic code generation system 16 To make matters worse the performance of codes such as matrix matrix multiply is highly sensitive to the memory hierarchy characteristics so writing portable highperformance codes is even more difficult It is typically necessary to use a code generation system on top of C or Fortran in order to get the flexibility needed for register blocking according to computer architecture In this paper we apply the fundamental generic programming approaches used by the Standard Template Library STL to the domain of numerical linear algebra The resulting library which we call the Matrix Template Library MTL provides comprehensive functionality with a small number of of fundamental algorithms while at the This work was supported by NSF grants ASC94 22380 and CCR95 02710 same time achieving high performance We also explore the use of template metaprograms in the construction of the BLAIS kernels which provide an elegant solution to portable high performance for matrix matrix multiply and other blocked codes The Matrix Template Library 11 is in its second generation and has been completely rewritten using generic programming techniques 2 Generic Programming The principal idea behind the STL is that many algorithms can be abstracted away from the particular data structures on which they operate Algorithms typically need the abstract functionality of being able to traverse through a data structure and access its elements If data structures provide a standard interface for traversal and access generic algorithms can be mixed and matched with data structures called containers in STL This interface is realized through the iterator sometimes called a generalized pointer Abstractly linear algebra operations also consist of traversing through vectors and matrices Vector operations fit neatly into the generic programming approach The STL already defines several generic algorithms for vectors such as inner product Extending these generic algorithms to encompass the rest of the common Level 1 BLAS 9 is a trivial matter template class Row2DIter class IterX class IterY void matvec mult Row2DIter i Row2DIter iend IterX x IterY y typename Row2DIter value type const iterator j while not at i iend j i begin typename IterY value type tmp 0 while not at j i end tmp j x j index j y i index tmp i Fig 1 Simplified example of a generic matrix vector product Matrix operations are slightly more complex since the elements are arranged in a 2dimensional format The MTL processes matrices as if they are containers of containers note that the matrix implementations are typically not actual containers of containers The matrix algorithms are coded in terms of iterators and two dimensional iterators A Row2DIter can traverse the rows of a matrix and produces a row vector when dereferenced The iterator for the row vector can then be used to access the individual matrix elements The example in Fig 1 shows how one can write a generic matrixvector product Function Name Operation Vector Algorithms set x alpha xi scale x alpha x x s sum x s i xi s one norm x s i j xi j s two norm x s i x2i 12 s inf norm x s max j xi j i find max abs x i index of max j xi j s max x s max xi s min x s min xi Matrix Algorithms set A alpha A scale A alpha A A set diag A alpha Aii s one norm A s maxi j j aij j s inf norm A s maxj i j aij j transpose A A AT Matrix Matrix copy A B B A add A C C A C mult A B C C A B tri solve T B C C T 1 B P P P P P Function Name Operation Vector Vector copy x y y x swap x y y x ele mult x y z z y x ele div x y z z y x add x y y x y s dot x y s xT y s dot conj x y s xT y Matrix Vector mult A x y mult A x y z tri solve T x y rank one x A rank two x y A swap A B ele mult A B C mult A B C E y z y A A y B C E A x A x y T 1 x x yT A x yT xT A A B A A B C Table 1 MTL linear algebra operations 3 MTL Algorithms Table 1 lists the principal algorithms covered by MTL This list seems sparse but a large number of functions are indeed provided through the combination of the above algorithms with the strided scaled and trans adapter functions Fig 2 shows how this is done with a matrix vector multiply and with a scaled vector assignment The unique feature of the Matrix Template Library is that for the most part each of the algorithms is implemented with just one template function Just one algorithm is used whether the matrix is sparse dense banded single precision double complex etc From a software maintenance standpoint the reuse of code gives MTL a significant advantage over the BLAS 4 5 or even other object


The Matrix Template Library

Download The Matrix Template Library
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Matrix Template Library and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Matrix Template Library and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?