Unformatted text preview:

Matrix Multiplication 1 Inner product method Matrix multiplication is usually written do i 1 n do j 1 n do k 1 n C i j C i j A i k B k j end do end do end do The most direct translation of this program into vector form is do i 1 n do j 1 n C i j DOT PRODUCT A i 1 n B 1 n j end do end do 1 of 20 The time on a pipelined machine is n 2 s 1 log n s 2 n 1 The time on an array machine or multiprocessor if P n is t log n t n 2 2 of 20 2 Middle product method n parallelism This is obtained by interchanging the headers in the original matrix multiplication loop do j 1 n do k 1 n do i 1 n C i j C i j A i k B k j end do end do end do The direct translation of this loop into vector form is do j 1 n do k 1 n C 1 n j C 1 n j A 1 n k B k j end do end do 3 of 20 Alternatively the headers could have been exchanged in a different order to obtain the loop do j 1 n do k 1 n C i 1 n C i 1 n A i k B k 1 n end do end do The time on a pipelined machine is 2n 2 s n 1 The time in an array machine is t t n 2 4 of 20 3 Outer product method n2 parallelism Another interchange of the loop headers produce do k 1 n do i 1 n do j 1 n C i j C i j A i k B k j end do end do end do To obtain n2 parallelism the inner two loops should take the form of a matrix operation do k 1 n C 1 n 1 n C 1 n 1 n A 1 n k B k 1 n end do Where the operator represents the outer product of two vectors Given two vectors a and b their outer product is a matrix Z such that Zi j ai bj Notice that the previous loop is 5 of 20 NOT a valid Fortran or Fortran 90 loop because is not a valid Fortran character The outer product matrix in the loop above has the following form A 1k B k1 A 1k Bk2 A 1k B k3 A 2k B k1 A 2k Bk2 A 2k B k3 A 3k B k1 A 3k Bk2 A 3k B k3 2 1 The two directions of replication This matrix is the element by element product of the following two matrices B k k A A A k B k k B k which are formed by replicating Ak A 1 n k and Bk B k 1 n along the appropriate dimensions This 6 of 20 replication can be achieved using the Fortran 90 SPREAD function discussed above spread A 1 n k dim 2 ncopies n spread B k 1 n dim 1 ncopies n The resulting loop is therefore do k 1 n C C SPREAD A 1 n k 2 n SPREAD B k 1 N 1 n end do In an array machine with P n2 the time would be 2 t copy log n t t n where tcopy is the time to copy a vector The time to spread to n copies is logarithmic as discussed in class 7 of 20 4 n3 parallelism The product of two n n matrices C matmul A B can be computed by adding n matrices of rank n n 3 B1 A 11 13 23 B B 2 12 B 1 A 21 3 A 33 B 2 1 B A 1 12 2 13 2 A 2 2 A 33 B B B 1 2 B 1 A 21 23 A 1 B 22 32 A B 1 A1 2 13 1 1 A 2 1 2 A 32 B B B 2 1 3 1 A 1 B 21 A2 A2 B3 3 C A 1 B 31 22 A 3 A2 8 of 20 These n matrices of rank n n can be computed by multiplying element by element two three dimensional arrays of rank n n n The two three dimensional arrays are formed by replicating A and B along different dimensions as shown next A A B B A B B This replication can again be achieved with SPREAD 9 of 20 Thus give the following three directions of replication 3 2 1 we can start by computing a n3 temporary array T as follows T SPREAD A DIM 3 NCOPIES n SPREAD B DIM 1 NCOPIES n Then the result is just C SUM T DIM 2 In an array machine with P n3 processing unit the time to compute C would be 2 t copy log n t t log n 10 of 20 8 Multiplication by Diagonals An n n matrix A is banded if Aij 0 for i j 1 j i 2 A 11 A 12 A 1 A 22 A 23 A 1 1 0 A 1 2 1 0 0 0 0 0 0 0 0 2 A 2 1 0 0 2 0 A n 1 n 2 A n 1 n A n n 1 1 A n n For a small band for example 1 2 3 the algorithm discussed before for matrix vector multiplication is not efficient 11 of 20 An alternative is to do the product by diagonals A 0 A1 A 0 0 0 2 A 1 0 0 A 1 0 0 0 0 0 0 0 V After separating the diagonals into separate matrices we get A0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A1 0 0 V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A 2 00 0 0 V 00 0 0 00 0 0 00 0 0 0 V 0 0 0 0 0 0 0 0 A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 V 0 0 0 0 0 V A 0 0 0 0 2 0 0 0 0 0 00 0 12 of 20 which can be written as follows 2 A0V A1V A V 2 2 A 1 Vn 1 A Vn 1 1 where Vj Vj Vn and Vn j V1 Vn j Also means add the shorter vector to the first component of the longer one and means add the shorter vector to the last component of the longer one In Fortran 90 except for the greek letters and the subscripts A0 1 n V 1 n A1 1 n 1 V 2 n 0 A 1 …


View Full Document
Loading Unlocking...
Login

Join to view Matrix Multiplication and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Matrix Multiplication and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?