CS 267 Applications of Parallel Computers Lecture 21: Hierarchical Methods for the N-Body problem - IOutlineParticle SimulationApplicationsReducing the number of particles in the force sumWhat is new: Using points at CM recursivelyQuad TreesOct TreesUsing Quad Trees and Oct TreesExample of an Adaptive Quad TreeAdaptive Quad Tree Algorithm (Oct Tree analogous)Barnes-Hut AlgorithmStep 2 of BH: compute CM and total mass of each nodeStep 3 of BH: compute force on each particleComputing force on a particle due to a nodeDetails of Step 3 of BHAnalysis of Step 3 of BHFast Multiple Method (FMM)Gravitational/Electrostatic Potential2D Multipole Expansion (Taylor expansion in 1/z)Error in Truncated 2D Multipole ExpansionOuter(n) and Outer ExpansionOuter_shift: converting Outer(n1) to Outer(n2)Outer_shift: continuedFMM: compute Outer(n) for each node n in QuadTreeInner(n) and Inner ExpansionSummary of FMMCS267 L21 N-Body Problem 1I.1Demmel Sp 1999CS 267 Applications of Parallel ComputersLecture 21: Hierarchical Methods for the N-Body problem - IJames Demmelhttp://www.cs.berkeley.edu/~demmel/cs267_Spr99CS267 L21 N-Body Problem I.2Demmel Sp 1999Outline°Motivation•Obvious algorithm for computing gravitational or electrostatic force on N bodies takes O(N2) work°How to reduce the number of particles in the force sum•We must settle for an approximate answer (say 2 decimal digits, or perhaps 16 …)°Basic Data Structures: Quad Trees and Oct Trees°The Barnes-Hut Algorithm (BH)•An O(N log N) approximate algorithm for the N-Body problem°The Fast Multipole Method (FMM)•An O(N) approximate algorithm for the N-Body problem°Parallelizing BH, FMM and other algorithms°Example applicationsCS267 L21 N-Body Problem I.3Demmel Sp 1999Particle Simulation°f(i) = external_force + nearest_neighbor_force + N-Body_force•External_force is usually embarrassingly parallel and costs O(N) for all particles-external current in Sharks and Fish•Nearest_neighbor_force requires interacting with a few neighbors, so still O(N)-van der Waals, bouncing balls•N-Body_force (gravity or electrostatics) requires all-to-all interactions-f(i) = f(i,k) … f(i,k) = force on i from k -f(i,k) = c*v/||v||3 in 3 dimensions or f(i,k) = c*v/||v||2 in 2 dimensions –v = vector from particle i to particle k , c = product of masses or charges–||v|| = length of v -Obvious algorithm costs O(N2), but we can do better...t = 0while t < t_final for i = 1 to n … n = number of particles compute f(i) = force on particle i for i = 1 to n move particle i under force f(i) for time dt … using F=ma compute interesting properties of particles (energy, etc.) t = t + dtend whilek != iCS267 L21 N-Body Problem I.4Demmel Sp 1999Applications°Astrophysics and Celestial Mechanics•Intel Delta = 1992 supercomputer, 512 Intel i860s•17 million particles, 600 time steps, 24 hours elapsed time–M. Warren and J. Salmon–Gordon Bell Prize at Supercomputing 92•Sustained 5.2 Gflops = 44K Flops/particle/time step•1% accuracy•Direct method (17 Flops/particle/time step) at 5.2 Gflops would have taken 18 years, 6570 times longer°Plasma Simulation°Molecular Dynamics°Electron-Beam Lithography Device Simulation°Fluid Dynamics (vortex method)°Good sequential algorithms too!CS267 L21 N-Body Problem I.5Demmel Sp 1999Reducing the number of particles in the force sum°All later divide and conquer algorithms use same intuition°Consider computing force on earth due to all celestial bodies•Look at night sky, # terms in force sum >= number of visible stars•Oops! One “star” is really the Andromeda galaxy, which contains billions of real stars-Seems like a lot more work than we thought … °Don’t worry, ok to approximate all stars in Andromeda by a single point at its center of mass (CM) with same total mass•D = size of box containing Andromeda , r = distance of CM to Earth•Require that D/r be “small enough”•Idea not new: Newton approximated earth and falling apple by CMsCS267 L21 N-Body Problem I.6Demmel Sp 1999What is new: Using points at CM recursively°From Andromeda’s point of view, Milky Way is also a point mass°Within Andromeda, picture repeats itself•As long as D1/r1 is small enough, stars inside smaller box can be replaced by their CM to compute the force on Vulcan•Boxes nest in boxes recursivelyCS267 L21 N-Body Problem I.7Demmel Sp 1999Quad Trees°Data structure to subdivide the plane•Nodes can contain coordinates of center of box, side length•Eventually also coordinates of CM, total mass, etc.°In a complete quad tree, each nonleaf node has 4 childrenCS267 L21 N-Body Problem I.8Demmel Sp 1999Oct Trees°Similar Data Structure to subdivide spaceCS267 L21 N-Body Problem I.9Demmel Sp 1999Using Quad Trees and Oct Trees°All our algorithms begin by constructing a tree to hold all the particles°Interesting cases have nonuniformly distributed particles•In a complete tree most nodes would be empty, a waste of space and time°Adaptive Quad (Oct) Tree only subdivides space where particles are locatedCS267 L21 N-Body Problem I.10Demmel Sp 1999Example of an Adaptive Quad TreeChild nodes enumerated counterclockwisefrom SW corner, empty ones excludedCS267 L21 N-Body Problem I.11Demmel Sp 1999Adaptive Quad Tree Algorithm (Oct Tree analogous)Procedure QuadTreeBuild QuadTree = {emtpy} for j = 1 to N … loop over all N particles QuadTreeInsert(j, root) … insert particle j in QuadTree endfor … At this point, the QuadTree may have some empty leaves, … whose siblings are not empty Traverse the QuadTree eliminating empty leaves … via, say Breadth First SearchProcedure QuadTreeInsert(j, n) … Try to insert particle j at node n in QuadTree … By construction, each leaf will contain either 1 or 0 particles if the subtree rooted at n contains more than 1 particle … n is an internal node determine which child c of node n contains particle j QuadTreeInsert(j, c) else if the subtree rooted at n contains 1 particle … n is a leaf add n’s 4 children to the QuadTree move the particle already in n into the child containing it let c be the child of n containing j QuadTreeInsert(j, c) else … the subtree rooted at n is an empty leaf store particle j in node n end° Cost <= N * maximum cost of QuadTreeInsert = O(N * maximum depth of QuadTree)° Uniform distribution => depth of QuadTree
View Full Document