CS 267 Applications of Parallel Computers Lecture 11: Sources of Parallelism and Locality (Part 2)Recap of last lectureOutlineOrdinary Differential Equations ODEsSolving ODEsSolving ODEs - DetailsParallelism in Sparse Matrix-vector multiplicationGraph Partitioning and Sparse MatricesMore on Matrix Reordering via Graph PartitioningWhat about implicit methods and eigenproblems?Slide 11Continuous Variables, Continuous ParametersExample: Deriving the Heat EquationExplicit Solution of the Heat EquationParallelism in Explicit Method for PDEsInstability in solving the heat equation explicitlyImplicit Solution2D Implicit MethodAlgorithms for 2D Poisson Equation with N unknownsShort explanations of algorithms on previous slideRelation of Poisson’s equation to Gravity, ElectrostaticsComments on practical meshesComposite mesh from a mechanical structureConverting the mesh to a matrixEffects of Ordering Rows and Columns on Gaussian EliminationIrregular mesh: NASA Airfoil in 2D (direct solution)Irregular mesh: Tapered Tube (multigrid)Adaptive Mesh Refinement (AMR)Challenges of irregular meshes (and a few solutions)CS267 L11 Sources of Parallelism(2).1Demmel Sp 1999CS 267 Applications of Parallel ComputersLecture 11: Sources of Parallelism and Locality(Part 2)James Demmelhttp://www.cs.berkeley.edu/~demmel/cs267_Spr99CS267 L11 Sources of Parallelism(2).2Demmel Sp 1999Recap of last lecture°Simulation models°A model problem: sharks and fish°Discrete event systems°Particle systems°Lumped systems (Ordinary Differential Equations, ODEs)CS267 L11 Sources of Parallelism(2).3Demmel Sp 1999Outline°Continuation of (ODEs)°Partial Differential Equations (PDEs)CS267 L11 Sources of Parallelism(2).4Demmel Sp 1999Ordinary Differential EquationsODEsCS267 L11 Sources of Parallelism(2).5Demmel Sp 1999Solving ODEs°Explicit methods to compute solution(t)•Ex: Euler’s method•Simple algorithm: sparse matrix vector multiply•May need to take very small timesteps, especially if system is stiff (i.e. can change rapidly)°Implicit methods to compute solution(t)•Ex: Backward Euler’s Method•Larger timesteps, especially for stiff problems•More difficult algorithm: solve a sparse linear system°Computing modes of vibration•Finding eigenvalues and eigenvectors•Ex: do resonant modes of building match earthquakes?°All these reduce to sparse matrix problems•Explicit: sparse matrix-vector multiplication•Implicit: solve a sparse linear system-direct solvers (Gaussian elimination)-iterative solvers (use sparse matrix-vector multiplication)•Eigenvalue/vector algorithms may also be explicit or implicitCS267 L11 Sources of Parallelism(2).6Demmel Sp 1999Solving ODEs - Details°Assume ODE is x’(t) = f(x) = A*x, where A is a sparse matrix•Try to compute x(i*dt) = x[i] at i=0,1,2,…•Approximate x’(i*dt) by (x[i+1] - x[i] )/dt°Euler’s method:•Approximate x’(t)=A*x by (x[i+1] - x[i] )/dt = A*x[i] and solve for x[i+1]•x[i+1] = (I+dt*A)*x[i], i.e. sparse matrix-vector multiplication°Backward Euler’s method:•Approximate x’(t)=A*x by (x[i+1] - x[i] )/dt = A*x[i+1] and solve for x[i+1]•(I - dt*A)*x[i+1] = x[i], i.e. we need to solve a sparse linear system of equations°Modes of vibration•Seek solution of x’’(t) = A*x of form x(t) = sin(f*t)*x0, x0 a constant vector•Plug in to get -f *x0 = A*x0, I.e. -f is an eigenvalue and x0 is an eigenvector of A•Solution schemes reduce either to sparse-matrix multiplication, or solving sparse linear systems2 2CS267 L11 Sources of Parallelism(2).7Demmel Sp 1999Parallelism in Sparse Matrix-vector multiplication°y = A*x, where A is sparse and n x n°Questions•which processors store-y[i], x[i], and A[i,j]•which processors compute-y[i] = sum (from 1 to n) A[i,j] * x[j] = (row i of A) . x … a sparse dot product°Partitioning•Partition index set {1,…,n} = N1 u N2 u … u Np•For all i in Nk, Processor k stores y[i], x[i], and row i of A •For all i in Nk, Processor k computes y[i] = (row i of A) . x-“owner computes” rule: Processor k compute the y[i]s it owns°Goals of partitioning•balance load (how is load measured?)•balance storage (how much does each processor store?)•minimize communication (how much is communicated?)CS267 L11 Sources of Parallelism(2).8Demmel Sp 1999Graph Partitioning and Sparse Matrices 1 1 1 12 1 1 1 13 1 1 14 1 1 1 1 5 1 1 1 16 1 1 1 1 1 2 3 4 5 6362154°Relationship between matrix and graph°A “good” partition of the graph has•equal (weighted) number of nodes in each part (load and storage balance)•minimum number of edges crossing between (minimize communication)°Can reorder the rows/columns of the matrix by putting all the nodes in one partition togetherCS267 L11 Sources of Parallelism(2).9Demmel Sp 1999More on Matrix Reordering via Graph Partitioning°“Ideal” matrix structure for parallelism: (nearly) block diagonal•p (number of processors) blocks•few non-zeros outside these blocks, since these require communication=*P0P1P2P3P4CS267 L11 Sources of Parallelism(2).10Demmel Sp 1999What about implicit methods and eigenproblems?°Direct methods (Gaussian elimination)•Called LU Decomposition, because we factor A = L*U•Future lectures will consider both dense and sparse cases•More complicated than sparse-matrix vector multiplication°Iterative solvers•Will discuss several of these in future-Jacobi, Successive overrelaxiation (SOR) , Conjugate Gradients (CG), Multigrid,...•Most have sparse-matrix-vector multiplication in kernel°Eigenproblems•Future lectures will discuss dense and sparse cases•Also depend on sparse-matrix-vector multiplication, direct methods°Graph partitioning •Algorithms will be discussed in future lecturesCS267 L11 Sources of Parallelism(2).11Demmel Sp 1999Partial Differential EquationsPDEsCS267 L11 Sources of Parallelism(2).12Demmel Sp 1999Continuous Variables, Continuous ParametersExamples of such systems include°Heat flow: Temperature(position, time)°Diffusion: Concentration(position, time)°Electrostatic or Gravitational Potential:Potential(position)°Fluid flow: Velocity,Pressure,Density(position,time)°Quantum mechanics: Wave-function(position,time)°Elasticity: Stress,Strain(position,time)CS267 L11 Sources of Parallelism(2).13Demmel Sp 1999Example: Deriving
View Full Document