Slide 1Work and Span (Recap)Cilk Loops: Divide and ConquerSquare-Matrix MultiplicationParallelizing Matrix MultiplyRecursive Matrix MultiplicationD&C Matrix MultiplicationMatrix AdditionAnalysis of Matrix AdditionWork of Matrix MultiplicationSpan of Matrix MultiplicationParallelism of Matrix MultiplyStack TemporariesNo-Temp Matrix MultiplicationWork of No-Temp MultiplySpan of No-Temp MultiplyParallelism of No-Temp MultiplyHow general that was?General Matrix MultiplicationParallelizing General MMultSplit mSplit nSplit kMatrix-Vector MultiplicationMatrix-Vector MultiplicationMatrix-Transpose x VectorMatrix-Transpose x VectorHyperobjectsHyperobject solution1CS 240A : Numerical Examples in Shared Memory with Cilk++• Matrix-matrix multiplication• Matrix-vector multiplication• HyperobjectsThanks to Charles E. Leiserson for some of these slides2TP = execution time on P processorsT1 = work T∞ = span**Also called critical-path lengthor computational depth.Speedup on p processors∙T1/Tp Parallelism∙T1/T∞Work and Span (Recap)3cilk_for (int i=0; i<n; ++i) { A[i]+=B[i];}cilk_for (int i=0; i<n; ++i) { A[i]+=B[i];}Vector additionWork: T1 = Θ(n) Span: T∞ =Parallelism: T1/T∞ = Θ(n/lg n) Work: T1 =Span: T∞ = Θ(lg n)Parallelism: T1/T∞ =Cilk Loops: Divide and ConquerGgrain sizeAssume that G = Θ(1).Implementation4Square-Matrix Multiplicationc11c12⋯c1nc21c22⋯c2n⋮ ⋮ ⋱ ⋮cn1cn2⋯cnna11a12⋯a1na21a22⋯a2n⋮ ⋮ ⋱ ⋮an1an2⋯annb11b12⋯b1nb21b22⋯b2n⋮ ⋮ ⋱ ⋮bn1bn2⋯bnn= ·C A Bcij=k = 1naik bkjAssume for simplicity that n = 2k.5Parallelizing Matrix Multiplycilk_for (int i=1; i<n; ++i) { cilk_for (int j=0; j<n; ++j) { for (int k=0; k<n; ++k { C[i][j] += A[i][k] * B[k][j]; }}Θ(n3) Span: T∞ =Θ(n2) Work: T1 =Θ(n)Parallelism: T1/T∞ =For 1000 × 1000 matrices, parallelism ≈ (103)2 = 106.6Recursive Matrix Multiplication8 multiplications of n/2 × n/2 matrices.1 addition of n × n matrices.Divide and conquer —C11C12C21C22=·A11A12A21A22B11B12B21B22= +A11B11A11B12A21B11A21B12A12B21A12B22A22B21A22B227template <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T[n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); cilk_spawn MMult(C22, A21, B12, n/2); cilk_spawn MMult(C21, A21, B11, n/2); cilk_spawn MMult(D11, A12, B21, n/2); cilk_spawn MMult(D12, A12, B22, n/2); cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n); // C += D;}template <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T[n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); cilk_spawn MMult(C22, A21, B12, n/2); cilk_spawn MMult(C21, A21, B11, n/2); cilk_spawn MMult(D11, A12, B21, n/2); cilk_spawn MMult(D12, A12, B22, n/2); cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n); // C += D;}D&C Matrix MultiplicationRow/column length of matricesRow/column length of matricesDetermine submatrices by index calculationDetermine submatrices by index calculationCoarsen for efficiencyCoarsen for efficiency8template <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T[n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); cilk_spawn MMult(C22, A21, B12, n/2); cilk_spawn MMult(C21, A21, B11, n/2); cilk_spawn MMult(D11, A12, B21, n/2); cilk_spawn MMult(D12, A12, B22, n/2); cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n); // C += D;}template <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T[n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); cilk_spawn MMult(C22, A21, B12, n/2); cilk_spawn MMult(C21, A21, B11, n/2); cilk_spawn MMult(D11, A12, B21, n/2); cilk_spawn MMult(D12, A12, B22, n/2); cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n); // C += D;}Matrix Additiontemplate <typename T>void MAdd(T *C, T *D, int n) { cilk_for (int i=0; i<n; ++i) { cilk_for (int j=0; j<n; ++j) { C[n*i+j] += D[n*i+j]; } }}template <typename T>void MAdd(T *C, T *D, int n) { cilk_for (int i=0; i<n; ++i) { cilk_for (int j=0; j<n; ++j) { C[n*i+j] += D[n*i+j]; } }}9Analysis of Matrix Additiontemplate <typename T>void MAdd(T *C, T *D, int n) { cilk_for (int i=0; i<n; ++i) { cilk_for (int j=0; j<n; ++j) { C[n*i+j] += D[n*i+j]; } }}template <typename T>void MAdd(T *C, T *D, int n) { cilk_for (int i=0; i<n; ++i) { cilk_for (int j=0; j<n; ++j) { C[n*i+j] += D[n*i+j]; } }}Θ(n2) Span: A∞(n) =Work: A1(n) =Θ(lg n)Nested cilk_for statements have the same Θ(lg n) spanNested cilk_for statements have the same Θ(lg n) span10Work of Matrix Multiplicationtemplate <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T [n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); ⋮ cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n); // C += D;}template <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T [n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); ⋮ cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n); // C += D;}8M1(n/2) + A1(n) + Θ(1)= 8M1(n/2) + Θ(n2)= Θ(n3)CASE 1:nlogba = nlog28 = n3f(n) = Θ(n2)CASE 1:nlogba = nlog28 = n3f(n) = Θ(n2)Work: M1(n) =11Span of Matrix Multiplicationtemplate <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T [n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); ⋮ cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n, size); // C += D;}template <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T [n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); ⋮ cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n, size); // C += D;}M∞(n/2) + A∞(n) + Θ(1)= M∞(n/2) + Θ(lg n)= Θ(lg2n)CASE 2:nlogba =
View Full Document