CS 267 Applications of Parallel Computers Hierarchical Methods for the N-Body problemBig IdeaOutlineParticle SimulationApplicationsReducing the number of particles in the force sumWhat is new: Using points at CM recursivelySlide 8Quad TreesOct TreesUsing Quad Trees and Oct TreesExample of an Adaptive Quad TreeAdaptive Quad Tree Algorithm (Oct Tree analogous)Cost of Adaptive Quad Tree ConstrutionSlide 15Barnes-Hut AlgorithmStep 2 of BH: compute CM and total mass of each nodeStep 3 of BH: compute force on each particleComputing force on a particle due to a nodeDetails of Step 3 of BHAnalysis of Step 3 of BHSlide 22Fast Multiple Method (FMM)Gravitational/Electrostatic Potential2D Multipole Expansion (Taylor expansion in 1/z)Error in Truncated 2D Multipole ExpansionOuter(n) and Outer ExpansionInner(n) and Inner ExpansionTop Level Description of FMMStep 2 of FMM: Outer_shift: converting Outer(n1) to Outer(n2)Outer_shift: continuedStep 2 of FMM: compute Outer(n) for each node n in QuadTreeSlide 33Step 3 of FMM: Compute Inner(n) for each n in QuadTreeStep 3 of FMM: Inner(n1) = Inner_shift(Inner(n2))Step 3 of FMM: Inner(n4) = Convert(Outer(n3))Step 3 of FMM: Computing Inner(n) from other expansionsStep 3 of FMM: Interaction SetStep 3 of FMM: Compute Inner(n) for each n in QuadTreeSlide 40Slide 41Parallelizing Hierachical N-Body codesProgramming Model - BSPLoad Balancing Scheme 1: Orthogonal Recursive Bisection (ORB)Load Balancing Scheme 2: CostzonesLinearly Ordering Quadtree nodes for CostzonesLinearly Ordering Quadtree nodes for Costzones (continued)Determining Costzones in ParallelComputing Locally Essential Trees (LETs)Locally Essential Trees in Distributed MemoryPerformance Results - 1Performance Results - 2Old SlidesSlide 54Slide 55Slide 56Slide 57Slide 58Outer_shift: converting Outer(n1) to Outer(n2)Slide 60FMM: compute Outer(n) for each node n in QuadTreeSlide 62Summary of FMM03/21/2006 CS267 Lecture 19CS 267 Applications of Parallel ComputersHierarchical Methods for the N-Body problemJames Demmelwww.cs.berkeley.edu/~demmel/cs267_Spr0603/21/2006 CS267 Lecture 19Big Idea°Suppose the answer at each point depends on data at all the other points•Electrostatic, gravitational force•Solution of elliptic PDEs•Graph partitioning°Seems to require at least O(n2) work, communication°If the dependence on “distant” data can be compressed•Because it gets smaller, smoother, simpler…°Then by compressing data of groups of nearby points, can cut cost (work, communication) at distant points•Apply idea recursively: cost drops to O(n log n) or even O(n)°Examples:•Barnes-Hut or Fast Multipole Method (FMM) for electrostatics/gravity/…•Multigrid for elliptic PDE•Multilevel graph partitioning (METIS, Chaco,…)03/21/2006 CS267 Lecture 19Outline°Motivation•Obvious algorithm for computing gravitational or electrostatic force on N bodies takes O(N2) work°How to reduce the number of particles in the force sum•We must settle for an approximate answer (say 2 decimal digits, or perhaps 16 …)°Basic Data Structures: Quad Trees and Oct Trees°The Barnes-Hut Algorithm (BH)•An O(N log N) approximate algorithm for the N-Body problem°The Fast Multipole Method (FMM)•An O(N) approximate algorithm for the N-Body problem°Parallelizing BH, FMM and related algorithms03/21/2006 CS267 Lecture 19Particle Simulation°f(i) = external_force + nearest_neighbor_force + N-Body_force•External_force is usually embarrassingly parallel and costs O(N) for all particles-external current in Sharks and Fish•Nearest_neighbor_force requires interacting with a few neighbors, so still O(N)-van der Waals, bouncing balls•N-Body_force (gravity or electrostatics) requires all-to-all interactions-f(i) = f(i,k) … f(i,k) = force on i from k -f(i,k) = c*v/||v||3 in 3 dimensions or f(i,k) = c*v/||v||2 in 2 dimensions –v = vector from particle i to particle k , c = product of masses or charges–||v|| = length of v -Obvious algorithm costs O(N2), but we can do better...t = 0while t < t_final for i = 1 to n … n = number of particles compute f(i) = force on particle i for i = 1 to n move particle i under force f(i) for time dt … using F=ma compute interesting properties of particles (energy, etc.) t = t + dtend whilek != i03/21/2006 CS267 Lecture 19Applications°Astrophysics and Celestial Mechanics•Intel Delta = 1992 supercomputer, 512 Intel i860s•17 million particles, 600 time steps, 24 hours elapsed time–M. Warren and J. Salmon–Gordon Bell Prize at Supercomputing 92•Sustained 5.2 Gflops = 44K Flops/particle/time step•1% accuracy•Direct method (17 Flops/particle/time step) at 5.2 Gflops would have taken 18 years, 6570 times longer°Plasma Simulation°Molecular Dynamics°Electron-Beam Lithography Device Simulation°Fluid Dynamics (vortex method)°Good sequential algorithms too!03/21/2006 CS267 Lecture 19Reducing the number of particles in the force sum°All later divide and conquer algorithms use same intuition°Consider computing force on earth due to all celestial bodies•Look at night sky, # terms in force sum >= number of visible stars•Oops! One “star” is really the Andromeda galaxy, which contains billions of real stars-Seems like a lot more work than we thought … °Don’t worry, ok to approximate all stars in Andromeda by a single point at its center of mass (CM) with same total mass•D = size of box containing Andromeda , r = distance of CM to Earth•Require that D/r be “small enough”•Idea not new: Newton approximated earth and falling apple by CMs03/21/2006 CS267 Lecture 19What is new: Using points at CM recursively°From Andromeda’s point of view, Milky Way is also a point mass°Within Andromeda, picture repeats itself•As long as D1/r1 is small enough, stars inside smaller box can be replaced by their CM to compute the force on Vulcan•Boxes nest in boxes recursively03/21/2006 CS267 Lecture 19Outline°Motivation•Obvious algorithm for computing gravitational or electrostatic force on N bodies takes O(N2) work°How to reduce the number of particles in the force sum•We must settle for an approximate answer (say 2 decimal digits, or perhaps 16 …)°Basic Data Structures: Quad Trees and Oct Trees°The Barnes-Hut Algorithm (BH)•An O(N log N) approximate algorithm for the N-Body problem°The Fast Multipole Method (FMM)•An O(N) approximate algorithm for the N-Body
View Full Document