Provable Algorithms for Parallel Sweep Scheduling on Unstructured Meshes

Home> Academic Documents> Provable Algorithms for Parallel Sweep Scheduling on Unstructured Meshes

DOC PREVIEW

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Provable Algorithms for Parallel Sweep Scheduling on UnstructuredMeshesV.S. Anil Kumar1Madhav V. Marathe2Srinivasan Parthasarathy3Aravind Srinivasan3Sibylle Zust1AbstractWe present provably efficient parallel algorithms forsweep scheduling on unstructured meshes. Sweep schedul-ing is a commonly used technique in Radiation Transportproblems, and involves inverting an operator by iterativelysweeping across a mesh. Each sweep involves solving theoperator locally at each cell. However, each direction in-duces a partial order in which this computation can pro-ceed. On a distributed computing system, the goal is toschedule the computation, so that the length of the sched-ule is minimized.Several heuristics have been proposed for this problem;see [14, 15] and the references therein; but none of theheuristics have worst case performance guarantees. Herewe present a simple, almost linear time randomized al-gorithm which (provably) gives a schedule of length atmost O(log2n) times the optimal schedule for instanceswith n cells, when the communication cost is not con-sidered, and a slight variant, which coupled with a muchmore careful analysis, gives a schedule of (expected) lengthO(log m log log log m) times the optimal schedule for mprocessors. These are the first such provable guaranteesfor this problem. We also design a priority based list sched-ule using these ideas, with the same theoretical guarantee,but much better performance in practice.We complement our theoretical results with extensiveempirical analysis. The results show that (i) our algo-rithm performs very well and has significantly better per-formance guarantee in practice and (ii) the algorithm com-pares favorably with other natural and efficient parallel al-gorithms proposed in the literature [14, 15].1 Basic and Applied Simulation Science (CCS-5) and NationalInfrastructure and Analysis Cen ter (NISAC), Los Alamos Na-tional Laboratory, MS M997, P.O. Box 1663, Los Alamos, NM87545. Email: {anil,zust}@lanl.gov.2 Virgnia Bio-informatics Institute and Dept. of Com-puter Science, Virginia Tech, Blacksburg, VA. Email:[email protected] Department of Computer Science, University of Maryland, Col-lege Park, MD 20742. Email: {sri,srin}@cs.umd.edu. Researc hsupported in part by NSF Award CCR-0208005.1. IntroductionRadiation transport methods are commonly used inthe simulation and analysis of a wide variety of physi-cal phenomena. As described by Pautz [14], these meth-ods often involve inverting an operator4,andafre-quently used technique for inverting this operator iscalled sweep scheduling. During a sweep, the operator islocally solved for each spatial cell in the mesh in a spec-ified order for each direction from a specific set of di-rections (see Figure 1). Thus the computation on eachcell requires knowledge of the incoming fluxes, which aredetermined by either the boundary conditions, or fromupstream cells, solved in the previous iteration. Froma computational standpoint, this means that each di-rection induces a set of precedence constraints capturedby a dependency graph. An example is shown in Fig-ure 1(a): for instance, in direction i, we cannot start thecomputation on cell 6 unless we have completed it oncell 3. In each direction, the dependency graph is differ-ent, but is induced on the same set of mesh elements.As in Pautz [14], without loss of generality, we will as-sume that the dependency graph in each direction is adirected acyclic graph (DAG).Thus, the sweep along any direction is essentiallythe classical precedence constrained scheduling problem(e.g. [8]), where a partial order is defined on the cells(based on the direction), and the computation on thecells must be performed in any order consistent with thepartial order. However, when this problem is solved ina distributed system, a very significant constraint thatmust be satisfied [14] is: each cell must be processedon the same processor along each direction. This con-straint makes this problem different from the standardprecedence constrained scheduling problem; we call thisproblem the Sweep Scheduling Problem. This problemdoes not fall into any of the categories defined by Gra-ham et al. [2] for scheduling problems.Designing efficient algorithms for sweep schedulingon a set of m parallel processors is challenging becauseof the constraints mentioned above. In fact, if all the4 Referred to in [14] as the “streaming-plus-collision”cells in some direction form a chain, the computationhas to proceed sequentially. Thus, one can never expectto obtain linear scaling in the efficiency with increas-ing number of processors, for arbitrary instances. Thereare two important performance measures: the length ofthe schedule while ignoring communication costs (themakespan), and the total communication cost (the costof the messages that must be exchanged between pro-cessors), and the goal is to obtain a schedule that min-imizes both of these. While there are no known algo-rithms with provable performance guarantees for thesweep scheduling problem, a lot of empirical work hasbeen done. Pautz [14] describes some heuristics, suchas Depth First Descendant Seeking (DFDS), which workvery well in practice, but has no worst case guarantees.2. Our ContributionsAnalytical Results: Provable Approximation Al-gorithms. The main focus of this paper is to developalgorithms for sweep scheduling with provable guaran-tees, in contrast to existing heuristics such as DFDS [14],which work well on real meshes, but have no worst caseguarantees. For our theoretical bounds, we only focuson the problem which ignores the communication costsbetween processors - even this simplified version gener-alizes the well known precedence constrained schedul-ing problem [2], and is NP-complete; no provable re-sults are known so far for this simplified case.We design the Random Delay algorithm, and ana-lyze it rigorously to show that it gives an O(log2n)ap-proximation5(n is the number of mesh elements). Wethen show that a modification of this algorithm, cou-pled with an improved analysis actually leads to anO(log m log log log m)–approximation, where m is thenumber of processors. To our knowledge, these are thefirst algorithms with provable performance guarantees;in contrast, for all the heuristics studied by Pautz [14]there are worst case instances, admittedly not mesh like,where the schedule length could be Ω(m) times the op-timal. We then modify these algorithms to an algorithmthat has the same performance


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 10 pages.

Please select your school