Unformatted text preview:

Slide 1Work and Span (Recap)Cilk Loops: Divide and ConquerSquare-Matrix MultiplicationParallelizing Matrix MultiplyRecursive Matrix MultiplicationD&C Matrix MultiplicationMatrix AdditionAnalysis of Matrix AdditionWork of Matrix MultiplicationSpan of Matrix MultiplicationParallelism of Matrix MultiplyStack TemporariesNo-Temp Matrix MultiplicationWork of No-Temp MultiplySpan of No-Temp MultiplyParallelism of No-Temp MultiplyHow general that was?General Matrix MultiplicationParallelizing General MMultSplit mSplit nSplit kMatrix-Vector MultiplicationMatrix-Vector MultiplicationMatrix-Transpose x VectorMatrix-Transpose x VectorHyperobjectsHyperobject solution1CS 240A : Numerical Examples in Shared Memory with Cilk++• Matrix-matrix multiplication• Matrix-vector multiplication• HyperobjectsThanks to Charles E. Leiserson for some of these slides2TP = execution time on P processorsT1 = work T∞ = span**Also called critical-path lengthor computational depth.Speedup on p processors∙T1/Tp Parallelism∙T1/T∞Work and Span (Recap)3cilk_for (int i=0; i<n; ++i) { A[i]+=B[i];}cilk_for (int i=0; i<n; ++i) { A[i]+=B[i];}Vector additionWork: T1 = Θ(n) Span: T∞ =Parallelism: T1/T∞ = Θ(n/lg n) Work: T1 =Span: T∞ = Θ(lg n)Parallelism: T1/T∞ =Cilk Loops: Divide and ConquerGgrain sizeAssume that G = Θ(1).Implementation4Square-Matrix Multiplicationc11c12⋯c1nc21c22⋯c2n⋮ ⋮ ⋱ ⋮cn1cn2⋯cnna11a12⋯a1na21a22⋯a2n⋮ ⋮ ⋱ ⋮an1an2⋯annb11b12⋯b1nb21b22⋯b2n⋮ ⋮ ⋱ ⋮bn1bn2⋯bnn= ·C A Bcij=k = 1naik bkjAssume for simplicity that n = 2k.5Parallelizing Matrix Multiplycilk_for (int i=1; i<n; ++i) { cilk_for (int j=0; j<n; ++j) { for (int k=0; k<n; ++k { C[i][j] += A[i][k] * B[k][j]; }}Θ(n3) Span: T∞ =Θ(n2) Work: T1 =Θ(n)Parallelism: T1/T∞ =For 1000 × 1000 matrices, parallelism ≈ (103)2 = 106.6Recursive Matrix Multiplication8 multiplications of n/2 × n/2 matrices.1 addition of n × n matrices.Divide and conquer —C11C12C21C22=·A11A12A21A22B11B12B21B22= +A11B11A11B12A21B11A21B12A12B21A12B22A22B21A22B227template <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T[n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); cilk_spawn MMult(C22, A21, B12, n/2); cilk_spawn MMult(C21, A21, B11, n/2); cilk_spawn MMult(D11, A12, B21, n/2); cilk_spawn MMult(D12, A12, B22, n/2); cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n); // C += D;}template <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T[n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); cilk_spawn MMult(C22, A21, B12, n/2); cilk_spawn MMult(C21, A21, B11, n/2); cilk_spawn MMult(D11, A12, B21, n/2); cilk_spawn MMult(D12, A12, B22, n/2); cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n); // C += D;}D&C Matrix MultiplicationRow/column length of matricesRow/column length of matricesDetermine submatrices by index calculationDetermine submatrices by index calculationCoarsen for efficiencyCoarsen for efficiency8template <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T[n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); cilk_spawn MMult(C22, A21, B12, n/2); cilk_spawn MMult(C21, A21, B11, n/2); cilk_spawn MMult(D11, A12, B21, n/2); cilk_spawn MMult(D12, A12, B22, n/2); cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n); // C += D;}template <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T[n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); cilk_spawn MMult(C22, A21, B12, n/2); cilk_spawn MMult(C21, A21, B11, n/2); cilk_spawn MMult(D11, A12, B21, n/2); cilk_spawn MMult(D12, A12, B22, n/2); cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n); // C += D;}Matrix Additiontemplate <typename T>void MAdd(T *C, T *D, int n) { cilk_for (int i=0; i<n; ++i) { cilk_for (int j=0; j<n; ++j) { C[n*i+j] += D[n*i+j]; } }}template <typename T>void MAdd(T *C, T *D, int n) { cilk_for (int i=0; i<n; ++i) { cilk_for (int j=0; j<n; ++j) { C[n*i+j] += D[n*i+j]; } }}9Analysis of Matrix Additiontemplate <typename T>void MAdd(T *C, T *D, int n) { cilk_for (int i=0; i<n; ++i) { cilk_for (int j=0; j<n; ++j) { C[n*i+j] += D[n*i+j]; } }}template <typename T>void MAdd(T *C, T *D, int n) { cilk_for (int i=0; i<n; ++i) { cilk_for (int j=0; j<n; ++j) { C[n*i+j] += D[n*i+j]; } }}Θ(n2) Span: A∞(n) =Work: A1(n) =Θ(lg n)Nested cilk_for statements have the same Θ(lg n) spanNested cilk_for statements have the same Θ(lg n) span10Work of Matrix Multiplicationtemplate <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T [n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); ⋮ cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n); // C += D;}template <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T [n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); ⋮ cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n); // C += D;}8M1(n/2) + A1(n) + Θ(1)= 8M1(n/2) + Θ(n2)= Θ(n3)CASE 1:nlogba = nlog28 = n3f(n) = Θ(n2)CASE 1:nlogba = nlog28 = n3f(n) = Θ(n2)Work: M1(n) =11Span of Matrix Multiplicationtemplate <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T [n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); ⋮ cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n, size); // C += D;}template <typename T>void MMult(T *C, T *A, T *B, int n) { T * D = new T [n*n]; //base case & partition matrices cilk_spawn MMult(C11, A11, B11, n/2); cilk_spawn MMult(C12, A11, B12, n/2); ⋮ cilk_spawn MMult(D22, A22, B22, n/2); MMult(D21, A22, B21, n/2); cilk_sync; MAdd(C, D, n, size); // C += D;}M∞(n/2) + A∞(n) + Θ(1)= M∞(n/2) + Θ(lg n)= Θ(lg2n)CASE 2:nlogba =


View Full Document

UCSB CS 240A - NUMERICAL EXAMPLES

Download NUMERICAL EXAMPLES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view NUMERICAL EXAMPLES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view NUMERICAL EXAMPLES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?