CS 267: Applications of Parallel Computers Lecture 18 -- Structured GridsMotifsOutlineSolving PDEsExplicit Solution of PDEsMatrix View of Explicit Method for Heat EquationPoisson’s equation in 1D: 2u/x2 = f(x)2D Poisson’s equationAlgorithms for 2D (3D) Poisson Equation (N = n2 (n3) vars)Overview of Algorithms on n x n grid (n x n x n grid) (1)Overview of Algorithms on n x n grid (n x n x n grid) (2)Jacobi’s MethodConvergence of Nearest Neighbor MethodsSlide 14Parallelizing Jacobi’s MethodLocality Optimizations in JacobiRedundant Ghost Nodes in JacobiImprovements on JacobiGauss-SeidelSuccessive Overrelaxation (SOR)Conjugate Gradient (CG) for solving A*x = bConjugate Gradient Algorithm for Solving Ax=bSlide 23Slide 24Multigrid MotivationMultigrid MethodsGauss Seidel is Slow IGauss Seidel is Slow IIMultigrid OverviewSame Big Idea used elsewhereMultigrid uses Divide-and-Conquer in 2 WaysMultigrid Sketch in 1DMultigrid Sketch in 2DMultigrid HierarchyBasic Multigrid IdeasMultigrid OperatorsSlide 37Slide 38Slide 39Multigrid V-Cycle AlgorithmThis is called a V-CycleComplexity of a V-CycleFull Multigrid (FMG)Full Multigrid Cost AnalysisComplexity of Solving Poisson’s EquationThe Solution Operator S(i) - DetailsWeighted Jacobi chosen to damp high frequency errorMultigrid as Divide and Conquer AlgorithmThe Restriction Operator R(i) - DetailsInterpolation Operator In(i-1): detailsConvergence Picture of Multigrid in 1DConvergence Picture of Multigrid in 2DParallel 2D MultigridPerformance Model of parallel 2D Multigrid (V-cycle)Comparison of Methods (in O(.) sense)PracticalitiesMultigrid on an Adaptive MeshMultiblock ApplicationsAdaptive Mesh RefinementSupport for AMRMultigrid on an Unstructured MeshSlide 62Source of Unstructured Finite Element Mesh: VertebraMultigrid for nonlinear elastic analysis of boneExtra SlidesPreconditioningMultigrid PhilosophicallyMultigrid Algorithm: procedure MG(level, A, u, f)Multigrid CyclesSlide 70Templates for choosing an iterative algorithm for Ax=bSummary of Jacobi, SOR and CGCS 267: Applications of Parallel ComputersLecture 18 -- Structured GridsJim Demmel and Horst D. [email protected]@eecs.berkeley.eduhttp://www.cs.berkeley.edu/~demmel/cs267_Sp r10/04/13/09 CS267 Lecture 202MotifsThe Motifs (formerly “Dwarfs”) from “The Berkeley View” (Asanovic et al.)Motifs form key computational patternsTopic of this lecture04/13/09 CS267 Lecture 20Outline°Review PDEs °Overview of Methods for Poisson Equation°Jacobi’s method°Red-Black SOR method°Conjugate Gradient°Sparse LU (Lecture 17, 3/16/10, Sherry Li)°FFT (Lecture 22, 4/1/10 )°Multigrid04/13/09 CS267 Lecture 20Solving PDEs° Hyperbolic problems (waves):•Sound wave(position, time)•Use explicit time-stepping •Solution at each point depends on neighbors at previous time°Elliptic (steady state) problems:•Electrostatic Potential (position)•Everything depends on everything else•This means locality is harder to find than in hyperbolic problems°Parabolic (time-dependent) problems:•Temperature(position,time)•Involves an elliptic solve at each time-step°Focus on elliptic problems•Canonical example is the Poisson equation2u/x2 + 2u/y2 + 2u/z2 = f(x,y,z)04/13/09 CS267 Lecture 20Explicit Solution of PDEs°Explicit methods often used for hyperbolic PDEs°Stability limits size of time-step°Computation corresponds to •Matrix vector multiply •Combine nearest neighbors on grid°Use finite differences with u[j,i] as the solution at•time t= i* (i = 0,1,2,…) and •position x = j*h (j=0,1,…,N=1/h)•initial conditions on u[j,0]•boundary conditions on u[0,i] and u[N,i]°At each timestep i = 0,1,2,...i=5i=4i=3i=2i=1i=0u[0,0] u[1,0] u[2,0] u[3,0] u[4,0] u[5,0]jiFor j=0 to N u[j,i+1]= z*u[j-1,i]+ (1-2*z)*u[j,i] + z*u[j+1,i]where z =C*/h204/13/09 CS267 Lecture 20Matrix View of Explicit Method for Heat Equation•u[j,i+1]= z*u[j-1,i]+ (1-2*z)*u[j,i] + z*u[j+1,i]•u[ :, i+1] = T * u[ :, i] where T is tridiagonal:•L called Laplacian (in 1D)•For a 2D mesh (5 point stencil) the Laplacian is pentadiagonal (5 nonzeros per row, along diagonals)•For a 3D mesh there are 7 nonzeros per row1-2zzzGraph and “3 point stencil”T == I – z*L, L = 2 -1 -1 2 -1 -1 2 -1 -1 2 -1 -1 21-2z z z 1-2z z z 1-2z z z 1-2z z z 1-2z04/13/09 CS267 Lecture 20Poisson’s equation in 1D: 2u/x2 = f(x)2 -1 -1 2 -1 -1 2 -1 -1 2 -1 -1 2T =2-1 -1Graph and “stencil”04/13/09 CS267 Lecture 202D Poisson’s equation°Similar to the 1D case, but the matrix T is now°3D is analogous4 -1 -1-1 4 -1 -1 -1 4 -1 -1 4 -1 -1 -1 -1 4 -1 -1 -1 -1 4 -1 -1 4 -1 -1 -1 4 -1 -1 -1 4T =4-1-1-1-1Graph and “stencil”04/13/09 CS267 Lecture 20Algorithms for 2D (3D) Poisson Equation (N = n2 (n3) vars)Algorithm Serial PRAM Memory #Procs°Dense LU N3N N2N2°Band LU N2 (N7/3) N N3/2 (N5/3) N (N4/3)°Jacobi N2 (N5/3) N (N2/3) N N°Explicit Inv. N2 log N N2N2°Conj.Gradients N3/2 (N4/3) N1/2(1/3) *log N N N°Red/Black SOR N3/2 (N4/3) N1/2 (N1/3) N N°Sparse LU N3/2 (N2) N1/2 N*log N (N4/3) N°FFT N*log N log N N N°Multigrid N log2 N N N°Lower bound N log N NPRAM is an idealized parallel model with zero cost communicationReference: James Demmel, Applied Numerical Linear Algebra, SIAM, 1997.04/13/09 CS267 Lecture 20Overview of Algorithms on n x n grid (n x n x n grid) (1)°Sorted in two orders (roughly):•from slowest to fastest on sequential machines.•from most general (works on any matrix) to most specialized (works on matrices “like” T).°Dense LU: Gaussian elimination; works on any N-by-N matrix.°Band LU: Exploits the fact that T is nonzero only on N1/2 (N2/3) diagonals nearest main diagonal.°Jacobi: Essentially does matrix-vector multiply by T in inner loop of iterative algorithm.°Explicit Inverse: Assume we want to solve many systems with T, so we can precompute and store inv(T) “for free”, and just multiply by it (but still expensive!).°Conjugate Gradient: Uses matrix-vector
View Full Document