Slide 1Problem StatementAnalogy: science of electro-magnetismOrganization of talkSlide 5ExamplesStencil computation: Jacobi iterationDelaunay Mesh RefinementEvent-driven simulationRemarks on algorithmsOrganization of talkRequirementsTraditional abstractionOperator formulation of algorithmsParallelismGalois programming model (PLDI 2007)Slide 17Parameter tool (PPoPP 2009)Example: DMRExamplesSummaryOrganization of talkKey ideaAlgorithm abstractionsMorphsSlide 26Graph partitioning (ASPLOS 2008)Zero-copy implementationDelaunay mesh refinementSurvey propagation on MaverickSlide 31SchedulingOngoing workAcknowledgements (in alphabetical order)Science of Parallel Programming1Keshav PingaliUniversity of Texas, AustinTowards a Science of Parallel Programming2Problem Statement•Community has worked on parallel programming for more than 30 years–programming models–machine models–programming languages–….•However, parallel programming is still a research problem –matrix computations, stencil computations, FFTs etc. are well-understood–each new application is a “new phenomenon”•few insights for irregular applications •Thesis: we need a science of parallel programming –analysis: framework for thinking about parallelism in application–synthesis: produce an efficient parallel implementation of application“The Alchemist” Cornelius Bega (1663)3Analogy: science of electro-magnetism Seemingly unrelated phenomenaUnifying abstractionsSpecialized modelsthat exploit structure4Organization of talk•Seemingly unrelated parallel algorithms and data structures–Stencil codes–Delaunay mesh refinement–Event-driven simulation–Graph reduction of functional languages–…•Unifying abstractions–Operator formulation of algorithms–Amorphous data-parallelism–Galois programming model–Baseline parallel implementation •Specialized implementations that exploit structure–Structure of algorithms–Optimized compiler and runtime system support for different kinds of structure•Ongoing work5Some parallel algorithms6ExamplesApplication/domain AlgorithmMeshing Generation/refinement/partitioningCompilers Iterative and elimination-based dataflow algorithmsFunctional interpreters Graph reduction, static and dynamic dataflowMaxflow Preflow-push, augmenting pathsMinimal spanning trees Prim, Kruskal, BoruvkaEvent-driven simulation Chandy-Misra-Bryant, Jefferson TimewarpAI Message-passing algorithmsStencil computations Jacobi, Gauss-Seidel, red-black orderingSparse linear solvers Sparse MVM, sparse Cholesky factorization7Stencil computation: Jacobi iteration•Finite-difference method for solving PDEs–discrete representation of domain: grid•Values at interior points are updated using values at neighbors–values at boundary points are fixed •Data structure: –dense arrays•Parallelism: –values at all interior points can be computed simultaneously–parallelism is not dependent on input values•Compiler can find the parallelism–spatial loops are DO-ALL loops//Jacobi iteration with 5-point stencil//initialize array Afor time = 1, nsteps for <i,j> in [2,n-1]x[2,n-1] temp(i,j)=0.25*(A(i-1,j)+A(i+1,j)+A(i,j-1)+A(i,j+1)) for <i,j> in [2,n-1]x[2,n-1] A(i,j) = temp(i,j)(i,j)(i-1,j)(i+1,j)(i,j-1) (i,j+1)5-point stencil8Delaunay Mesh Refinement•Iterative refinement to remove badly shaped triangles:while there are bad triangles do { pick a bad triangle; find its cavity; retriangulate cavity; // may create new bad triangles}•Don’t-care non-determinism:–final mesh depends on order in which bad triangles are processed–applications do not care which mesh is produced•Data structure: –graph in which nodes represent triangles and edges represent triangle adjacencies•Parallelism: –bad triangles with cavities that do not overlap can be processed in parallel–parallelism is very “input-dependent”•compilers cannot determine this parallelism –(Miller et al.) at runtime, repeatedly build interference graph and find maximal independent sets for parallel executionMesh m = /* read in mesh */WorkList wl;wl.add(m.badTriangles());while ( !wl.empty() ) {Element e = wl.get();if (e no longer in mesh) continue;Cavity c = new Cavity(e);//determine new cavityc.expand();c.retriangulate();//re-triangulate regionm.update(c);//update meshwl.add(c.badTriangles());}9Event-driven simulation•Stations communicate by sending messages with time-stamps on FIFO channels•Stations have internal state that is updated when a message is processed•Messages must be processed in time-order at each station•Data structure:–Messages in event-queue, sorted in time-order•Parallelism: –conservative: Chandy-Misra-Bryant•station fires when it has messages on all incoming edges and processes earliest message•requires null messages to avoid deadlock–optimistic: Jefferson time-warp•station can fire when it has an incoming message on any edge•requires roll-back if speculative conflict is detected25AB4610Remarks on algorithms•Diverse algorithms and data structures•Exploiting parallelism in irregular algorithms is very complex–Miller et al. DMR implementation: interference graph + maximal independent sets–Jefferson Timewarp algorithm for event-driven simulation•Algorithms:–parallelism can be very input-dependent •DMR, event-driven simulation, graph reduction,….–don’t-care non-determinism•has nothing to do with concurrency•DMR, graph reduction–activities created dynamically may interfere with existing activities•event-driven simulation…•Data structures:–relatively few algorithms use dense arrays–more common: graphs, trees, lists, priority queues,…11Organization of talk•Seemingly unrelated parallel algorithms and data structures–Stencil codes–Delaunay mesh refinement–Event-driven simulation–Graph reduction of functional languages–………•Unifying abstractions–Amorphous data-parallelism–Baseline parallel implementation for exploiting amorphous data-parallelism•Specialized implementations that exploit structure–Structure of algorithms–Optimized compiler and runtime system support for different kinds of structure•Ongoing work12Requirements•Provide a model of parallelism in irregular algorithms•Unified treatment of parallelism in regular and irregular algorithms–parallelism in regular algorithms must emerge as a special case of general model–(cf.) correspondence principles in Physics•Abstractions should
View Full Document