04/4/2007 CS267 Lecture 21CS 267 Applications of Parallel ComputersHierarchical Methods for the N-Body problemKathy Yelickwww.cs.berkeley.edu/~yelick/cs267_s0704/4/2007 CS267 Lecture 21Big Idea° Suppose the answer at each point depends on data at all the other points• Electrostatic, gravitational force• Solution of elliptic PDEs• Graph partitioning° Seems to require at least O(n2) work, communication° If the dependence on “distant” data can be compressed• Because it gets smaller, smoother, simpler…° Then by compressing data of groups of nearby points, can cut cost (work, communication) at distant points• Apply idea recursively: cost drops to O(n log n) or even O(n)° Examples:• Barnes-Hut or Fast Multipole Method (FMM) for electrostatics/gravity/…• Multigrid for elliptic PDE• Multilevel graph partitioning (METIS, Chaco,…)04/4/2007 CS267 Lecture 21Outline° Motivation• Obvious algorithm for computing gravitational or electrostatic force on N bodies takes O(N2) work° How to reduce the number of particles in the force sum• We must settle for an approximate answer (say 2 decimal digits, or perhaps 16 …)° Basic Data Structures: Quad Trees and Oct Trees° The Barnes-Hut Algorithm (BH)• An O(N log N) approximate algorithm for the N-Body problem° The Fast Multipole Method (FMM)• An O(N) approximate algorithm for the N-Body problem° Parallelizing BH, FMM and related algorithms04/4/2007 CS267 Lecture 21Particle Simulation° f(i) = external_force + nearest_neighbor_force + N-Body_force• External_force is usually embarrassingly parallel and costs O(N) for all particles- external current in Sharks and Fish• Nearest_neighbor_force requires interacting with a few neighbors, so still O(N)- van der Waals, bouncing balls• N-Body_force (gravity or electrostatics) requires all-to-all interactions- f(i) = Σ f(i,k) … f(i,k) = force on i from k- f(i,k) = c*v/||v||3 in 3 dimensions or f(i,k) = c*v/||v||2 in 2 dimensions– v = vector from particle i to particle k , c = product of masses or charges– ||v|| = length of v - Obvious algorithm costs O(N2), but we can do better...t = 0while t < t_finalfor i = 1 to n … n = number of particlescompute f(i) = force on particle ifor i = 1 to nmove particle i under force f(i) for time dt … using F=macompute interesting properties of particles (energy, etc.)t = t + dtend whilek != i04/4/2007 CS267 Lecture 21Applications° Astrophysics and Celestial Mechanics• Intel Delta = 1992 supercomputer, 512 Intel i860s• 17 million particles, 600 time steps, 24 hours elapsed time– M. Warren and J. Salmon– Gordon Bell Prize at Supercomputing 92• Sustained 5.2 Gflops = 44K Flops/particle/time step• 1% accuracy• Direct method (17 Flops/particle/time step) at 5.2 Gflops would have taken 18 years, 6570 times longer° Plasma Simulation° Molecular Dynamics° Electron-Beam Lithography Device Simulation° Fluid Dynamics (vortex method)° Good sequential algorithms too!04/4/2007 CS267 Lecture 21Reducing the number of particles in the force sum° All later divide and conquer algorithms use same intuition° Consider computing force on earth due to all celestial bodies• Look at night sky, # terms in force sum >= number of visible stars• Oops! One “star” is really the Andromeda galaxy, which contains billions of real stars- Seems like a lot more work than we thought …° Don’t worry, ok to approximate all stars in Andromeda by a single point at its center of mass (CM) with same total mass• D = size of box containing Andromeda , r = distance of CM to Earth• Require that D/r be “small enough”• Idea not new: Newton approximated earth and falling apple by CMs04/4/2007 CS267 Lecture 21What is new: Using points at CM recursively° From Andromeda’s point of view, Milky Way is also a point mass° Within Andromeda, picture repeats itself• As long as D1/r1 is small enough, stars inside smaller box can be replaced by their CM to compute the force on Vulcan• Boxes nest in boxes recursively04/4/2007 CS267 Lecture 21Outline° Motivation• Obvious algorithm for computing gravitational or electrostatic force on N bodies takes O(N2) work° How to reduce the number of particles in the force sum• We must settle for an approximate answer (say 2 decimal digits, or perhaps 16 …)° Basic Data Structures: Quad Trees and Oct Trees° The Barnes-Hut Algorithm (BH)• An O(N log N) approximate algorithm for the N-Body problem° The Fast Multipole Method (FMM)• An O(N) approximate algorithm for the N-Body problem° Parallelizing BH, FMM and related algorithms04/4/2007 CS267 Lecture 21Quad Trees° Data structure to subdivide the plane• Nodes can contain coordinates of center of box, side length• Eventually also coordinates of CM, total mass, etc.° In a complete quad tree, each nonleaf node has 4 children04/4/2007 CS267 Lecture 21Oct Trees° Similar Data Structure to subdivide space04/4/2007 CS267 Lecture 21Using Quad Trees and Oct Trees° All our algorithms begin by constructing a tree to hold all the particles° Interesting cases have nonuniformly distributed particles• In a complete tree most nodes would be empty, a waste of space and time° Adaptive Quad (Oct) Tree only subdivides space where particles are located04/4/2007 CS267 Lecture 21Example of an Adaptive Quad TreeChild nodes enumerated counterclockwisefrom SW corner, empty ones excluded04/4/2007 CS267 Lecture 21Adaptive Quad Tree Algorithm (Oct Tree analogous)Procedure Quad_Tree_Build Quad_Tree = {empty}for j = 1 to N … loop over all N particlesQuad_Tree_Insert(j, root) … insert particle j in QuadTreeendfor… At this point, each leaf of Quad_Tree will have 0 or 1 particles … There will be 0 particles when some sibling has 1Traverse the Quad_Tree eliminating empty leaves … via, say Breadth First SearchProcedure Quad_Tree_Insert(j, n) … Try to insert particle j at node n in Quad_Treeif n an internal node … n has 4 childrendetermine which child c of node n contains particle jQuad_Tree_Insert(j, c)else if n contains 1 particle … n is a leafadd n’s 4 children to the Quad_Treemove the particle already in n into the child containing itlet c be the child of n containing jQuad_Tree_Insert(j, c)else … n empty store particle j in node nend04/4/2007 CS267 Lecture 21Cost of Adaptive Quad Tree Constrution° Cost <= N * maximum
View Full Document