DOC PREVIEW
UMD CMSC 714 - Efficient Run-time Support for Irregular Block-Structured Application

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Efficient Run-time Support for Irregular Block-Structured ApplicationsBy Stephen J. Fink and Scott B. Baden and Scott R. KohnPresented for your delectation by Asad B. SayeedBackground●Main type of application considered: scientific numerical methods.●These applications often use structured irregular representations to improve accuracy.–Difficult to implement.–Cause unpredictable/irregular communication patterns, impeding performance optimization.●Goal: assist programmer in arranging parallelism so the data layout and distribution best exploit memory arrangements.Kernel Lattice Parallelism●KeLP: Kernel Lattice Parallelism.–Library for higher level abstractions for managing data layout and data motion.–Applications with dynamic block structures: uniform rectangular data arrays with irregular data motion.–Geometric programming abstractions represent data layout and motion patterns–Data orchestration model: separate description of motion patterns from interpretation/implementation.–Structural abstraction: separate structure of data from storage.Kernel Lattice Parallelism●System implemented on top of MPI in C++.●Data orchestration implemented via MotionPlans and Movers.–Programmers define MotionPlans to schedule communication via geometric operations on memory structure.–Movers interpret the plans so as to conform to the hardware architecture and other application-specific issues.Programming Model●Programs begin with a single (logical) control thread.●for_all loop iterations: each one executes independently on one SPMD process.●Storage model: distribute each block of data to its own logical address space, one space per processor.●Little compiler automation, even for consistency. –Programmer explicitly describes data decomposition and also data motion (via block copy operations).Data Layout Abstractions●Four core data decompositions abstractions: Point, Region, Grid, XArray, inherited from KeLP's predecessor LPARX. KeLP innovation: FloorPlans.–Point: represents point in n-dim space.–Region: rectangular subset of Points..–Grid: array of data indexed by Region.–XArray : array of Grids of different (irregular) shape. –FloorPlans: array of Regions representing processor assignments for XArray.Data Layout Abstractions●Regions are constructed by Region calculus.●XArrays and FloorPlans:Data Motion Abstractions●MotionPlan–Data motion pattern defined over Xarrays.–Specified by programmer as set of array copy operations built via Region calculus.–Let G, H be Grids; let R, S be Regions. {G on R -> H on S} means copy index region R from G to region S from H. ●Mover: analyzes MotionPlan and performs movement.–Programmer can extend the Mover class to represent various communication operations.Data Motion Abstractions●MotionPlans illustrated:Data Layout and Data Motion●Summary of classes:Simple Data Motion Example●fillpatch: fills in ghost cells from logically overlapping grids.●Code and example of irregular XArray.Bigger Ghost Cells Example●Elliptic PDE solver:●Region2, etc are 2D arrays.Bigger Ghost Cells Example●Elliptic PDE FloorPlan:●for_1 vs for_all -> current thread vs distributed●The for_1 loop does initial ghost cell padding.●Sweep: performs iteration, probably in Fortran.Bigger Ghost Cells Example●fillGhost function: central to parallelism.●Mostly just implements fillPatch from before.–Last two lines perform the movement.–We're recomputing ghost cells each time: not normal.Implementation Issues●KeLP predecessor: LPARX.–LPARX allowed asynchronous one-sided communication: creates barriers for process state global synchronization.–KeLP: bans copy operations from for_all loops, eliminating this problem; ie, only for_1 loops perform copies. Each process stores relevant portion of movement plan.●Mover: implemented via nonblocking MPI send–In and out buffers allocated to each process.–Receives data while it waits for sends to finish.Performance●Comparison to MPI.–Three benchmarks involving heavy matrix computation: NAS-FT, NAS-MG, SUMMA.–Very conservatively translated to KeLP from MPI.Performance●Adaptive multigrid: lda3d–Eigenvalue solver from “ab initio” materials science.–Highly irregular communication.Performance●jacobi3d: tuning communication performance.–Three KeLP versions and one hand-coded version.–KeLP versions vary by ghost cell communication arrangements.Related Work●KeLP: structural abstraction from LPARX combined with inspector/executor communication analysis.●Other structural abstraction implementation: BOXLIB, DAGH. More specialized.●Inspector/executor appears in Multiblock PARTI, which does not allow irregular block decompositions—doesn't have same level of structural abstraction.●Number of other related applications.Conclusion●Structural abstractions hide some of the dirty work required for efficient communication within irregular block decompositions.●Despite KeLP being a high-level abstraction over MPI, performs very favorably compared to MPI.●Inspector/executor paradigm (MotionPlans vs Movers) allows retargetting to various situations.Kernel Lettuce Decomposition,more widely eaten than


View Full Document

UMD CMSC 714 - Efficient Run-time Support for Irregular Block-Structured Application

Documents in this Course
MTOOL

MTOOL

7 pages

BOINC

BOINC

21 pages

Eraser

Eraser

14 pages

Load more
Download Efficient Run-time Support for Irregular Block-Structured Application
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Efficient Run-time Support for Irregular Block-Structured Application and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Efficient Run-time Support for Irregular Block-Structured Application 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?