DOC PREVIEW
Berkeley COMPSCI C267 - Sources of Parallelism and Locality in Simulation – Part 2

This preview shows page 1-2-3-21-22-23-42-43-44 out of 44 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

02/14/2007 CS267 Lecture 9 1CS 267Sources of Parallelism and Locality in Simulation – Part 2Kathy Yelickwww.cs.berkeley.edu/~yelick/cs267_sp0702/14/2007 CS267 Lecture 9 2Recap of Last Lecture• 4 kinds of simulations• Discrete Event Systems• Particle Systems• Ordinary Differential Equations (ODEs) (finish today)• Partial Differential Equations (PDEs) (today)• Common problems:• Load balancing• May be due to lack of parallelism or poor work distribution• Dynamically, if load changes significantly during run• Statically, divide grid (or graph) into blocks• Locality• Partition into large chunks with low aspect ratio (reduce surface to volume ratio)• Distributed particles according to location, but use irregular spatial decomposition (e.g., quad tree) for load balance• Constant tension between these two02/14/2007 CS267 Lecture 9 3ODEs and Sparse Matrices• All these problems reduce to sparse matrix problems• Explicit: sparse matrix-vector multiplication (SpMV).• Implicit: solve a sparse linear system• direct solvers (Gaussian elimination).• iterative solvers (use sparse matrix-vector multiplication).• Eigenvalue/vector algorithms may also be explicit or implicit.• Conclusion: SpMV is key to many ODE problems• Relatively simple algorithm to study in detail• Two key problems: locality and load balance02/14/2007 CS267 Lecture 9 4Matrix-vector multiply kernel: y(i) Å y(i) + A(i,j)⋅x(j)Matrix-vector multiply kernel: y(i) Å y(i) + A(i,j)⋅x(j)for each row ifor k=ptr[i] to ptr[i+1] doy[i] = y[i] + val[k]*x[ind[k]]SpMV in Compressed Sparse Row (CSR) FormatMatrix-vector multiply kernel: y(i) Å y(i) + A(i,j)⋅x(j)for each row ifor k=ptr[i] to ptr[i+1] doy[i] = y[i] + val[k]*x[ind[k]]AyxRepresentation of ACSR format is one of many possibilities02/14/2007 CS267 Lecture 9 5Parallel Sparse Matrix-vector multiplication• y = A*x, where A is a sparse n x n matrix• Questions• which processors store• y[i], x[i], and A[i,j]• which processors compute• y[i] = sum (from 1 to n) A[i,j] * x[j]= (row i of A) * x … a sparse dot product• Partitioning• Partition index set {1,…,n} = N1 ∪ N2 ∪ … ∪ Np.• For all i in Nk, Processor k stores y[i], x[i], and row i of A • For all i in Nk, Processor k computes y[i] = (row i of A) * x• “owner computes” rule: Processor k compute the y[i]s it owns.xyP1P2P3P4May require communication02/14/2007 CS267 Lecture 9 6Matrix Reordering via Graph Partitioning• “Ideal” matrix structure for parallelism: block diagonal• p (number of processors) blocks, can all be computed locally.• If no non-zeros outside these blocks, no communication needed• Can we reorder the rows/columns to get close to this?• Most nonzeros in diagonal blocks, few outsideP0P1P2P3P4=*P0 P1 P2 P3 P402/14/2007 CS267 Lecture 9 7Goals of Reordering• Performance goals• balance load (how is load measured?).• Approx equal number of nonzeros (not necessarily rows)• balance storage (how much does each processor store?).• Approx equal number of nonzeros• minimize communication (how much is communicated?).• Minimize nonzeros outside diagonal blocks• Related optimization criterion is to move nonzeros near diagonal• improve register and cache re-use• Group nonzeros in small vertical blocks so source (x) elements loaded into cache or registers may be reused (temporal locality)• Group nonzeros in small horizontal blocks so nearby source (x) elements in the cache may be used (spatial locality)• Other algorithms reorder for other reasons• Reduce # nonzeros in matrix after Gaussian elimination• Improve numerical stability02/14/2007 CS267 Lecture 9 8Graph Partitioning and Sparse Matrices 1 1 1 12 1 1 1 13 1 1 14 1 1 1 15 1 1 1 16 1 1 1 11 2 3 4 5 636152• Relationship between matrix and graph• Edges in the graph are nonzero in the matrix: here the matrix issymmetric (edges are unordered) and weights are equal (1)• If divided over 3 procs, there are 14 nonzeros outside the diagonal blocks, which represent the 7 (bidirectional) edges402/14/2007 CS267 Lecture 9 9Graph Partitioning and Sparse Matrices 1 1 1 12 1 1 1 13 1 1 14 1 1 1 15 1 1 1 16 1 1 1 11 2 3 4 5 6• Relationship between matrix and graph• A “good” partition of the graph has• equal (weighted) number of nodes in each part (load and storage balance).• minimum number of edges crossing between (minimize communication).• Reorder the rows/columns by putting all nodes in one partition together.36154202/14/2007 CS267 Lecture 9 10Partial Differential EquationsPDEs02/14/2007 CS267 Lecture 9 11Continuous Variables, Continuous ParametersExamples of such systems include• Elliptic problems (steady state, global space dependence)• Electrostatic or Gravitational Potential: Potential(position)• Hyperbolic problems (time dependent, local space dependence):• Sound waves: Pressure(position,time)• Parabolic problems (time dependent, global space dependence)• Heat flow: Temperature(position, time)• Diffusion: Concentration(position, time)Many problems combine features of above• Fluid flow: Velocity,Pressure,Density(position,time)• Elasticity: Stress,Strain(position,time)02/14/2007 CS267 Lecture 9 12Example: Deriving the Heat Equation01xx+hConsider a simple problem• A bar of uniform material, insulated except at ends•Let u(x,t) be the temperature at position x at time t• Heat travels from x-h to x+h at rate proportional to:•As h Æ0, we get the heat equation:d u(x,t) (u(x-h,t)-u(x,t))/h - (u(x,t)- u(x+h,t))/hdt h= C *d u(x,t) d2u(x,t)dt dx2= C *x-h02/14/2007 CS267 Lecture 9 13Details of the Explicit Method for Heat• Discretize time and space using explicit approach (forward Euler) to approximate time derivative:(u(x,t+δ) – u(x,t))/δ= C (u(x-h,t) – 2*u(x,t) + u(x+h,t))/h2u(x,t+δ) = u(x,t)+ C*δ/h2*(u(x-h,t) – 2*u(x,t) + u(x+h,t))•Let z = C*δ /h2u(x,t+δ) = z* u(x-h,t) + (1-2z)*u(x,t) + z*u(x+h,t)• Change variable x to j*h, t to i*δ, and u(x,t) to u[j,i]u[j,i+1]= z*u[j-1,i]+ (1-2*z)*u[j,i]+ z*u[j+1,i]d u(x,t) d2u(x,t)dt dx2= C *02/14/2007 CS267 Lecture 9 14Explicit Solution of the Heat Equation• Use


View Full Document

Berkeley COMPSCI C267 - Sources of Parallelism and Locality in Simulation – Part 2

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Sources of Parallelism and Locality in Simulation – Part 2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sources of Parallelism and Locality in Simulation – Part 2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sources of Parallelism and Locality in Simulation – Part 2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?