DOC PREVIEW
UT CS 378 - Towards a Science of Parallel Programming

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Problem StatementAnalogy: science of electro-magnetismOrganization of talkSlide 5ExamplesStencil computation: Jacobi iterationDelaunay Mesh RefinementEvent-driven simulationRemarks on algorithmsOrganization of talkRequirementsTraditional abstractionOperator formulation of algorithmsParallelismGalois programming model (PLDI 2007)Slide 17Parameter tool (PPoPP 2009)Example: DMRExamplesSummaryOrganization of talkKey ideaAlgorithm abstractionsMorphsSlide 26Graph partitioning (ASPLOS 2008)Zero-copy implementationDelaunay mesh refinementSurvey propagation on MaverickSlide 31SchedulingOngoing workAcknowledgements (in alphabetical order)Science of Parallel Programming1Keshav PingaliUniversity of Texas, AustinTowards a Science of Parallel Programming2Problem Statement•Community has worked on parallel programming for more than 30 years–programming models–machine models–programming languages–….•However, parallel programming is still a research problem –matrix computations, stencil computations, FFTs etc. are well-understood–each new application is a “new phenomenon”•few insights for irregular applications •Thesis: we need a science of parallel programming –analysis: framework for thinking about parallelism in application–synthesis: produce an efficient parallel implementation of application“The Alchemist” Cornelius Bega (1663)3Analogy: science of electro-magnetism Seemingly unrelated phenomenaUnifying abstractionsSpecialized modelsthat exploit structure4Organization of talk•Seemingly unrelated parallel algorithms and data structures–Stencil codes–Delaunay mesh refinement–Event-driven simulation–Graph reduction of functional languages–…•Unifying abstractions–Operator formulation of algorithms–Amorphous data-parallelism–Galois programming model–Baseline parallel implementation •Specialized implementations that exploit structure–Structure of algorithms–Optimized compiler and runtime system support for different kinds of structure•Ongoing work5Some parallel algorithms6ExamplesApplication/domain AlgorithmMeshing Generation/refinement/partitioningCompilers Iterative and elimination-based dataflow algorithmsFunctional interpreters Graph reduction, static and dynamic dataflowMaxflow Preflow-push, augmenting pathsMinimal spanning trees Prim, Kruskal, BoruvkaEvent-driven simulation Chandy-Misra-Bryant, Jefferson TimewarpAI Message-passing algorithmsStencil computations Jacobi, Gauss-Seidel, red-black orderingSparse linear solvers Sparse MVM, sparse Cholesky factorization7Stencil computation: Jacobi iteration•Finite-difference method for solving PDEs–discrete representation of domain: grid•Values at interior points are updated using values at neighbors–values at boundary points are fixed •Data structure: –dense arrays•Parallelism: –values at all interior points can be computed simultaneously–parallelism is not dependent on input values•Compiler can find the parallelism–spatial loops are DO-ALL loops//Jacobi iteration with 5-point stencil//initialize array Afor time = 1, nsteps for <i,j> in [2,n-1]x[2,n-1] temp(i,j)=0.25*(A(i-1,j)+A(i+1,j)+A(i,j-1)+A(i,j+1)) for <i,j> in [2,n-1]x[2,n-1] A(i,j) = temp(i,j)(i,j)(i-1,j)(i+1,j)(i,j-1) (i,j+1)5-point stencil8Delaunay Mesh Refinement•Iterative refinement to remove badly shaped triangles:while there are bad triangles do { pick a bad triangle; find its cavity; retriangulate cavity; // may create new bad triangles}•Don’t-care non-determinism:–final mesh depends on order in which bad triangles are processed–applications do not care which mesh is produced•Data structure: –graph in which nodes represent triangles and edges represent triangle adjacencies•Parallelism: –bad triangles with cavities that do not overlap can be processed in parallel–parallelism is very “input-dependent”•compilers cannot determine this parallelism –(Miller et al.) at runtime, repeatedly build interference graph and find maximal independent sets for parallel executionMesh m = /* read in mesh */WorkList wl;wl.add(m.badTriangles());while ( !wl.empty() ) {Element e = wl.get();if (e no longer in mesh) continue;Cavity c = new Cavity(e);//determine new cavityc.expand();c.retriangulate();//re-triangulate regionm.update(c);//update meshwl.add(c.badTriangles());}9Event-driven simulation•Stations communicate by sending messages with time-stamps on FIFO channels•Stations have internal state that is updated when a message is processed•Messages must be processed in time-order at each station•Data structure:–Messages in event-queue, sorted in time-order•Parallelism: –conservative: Chandy-Misra-Bryant•station fires when it has messages on all incoming edges and processes earliest message•requires null messages to avoid deadlock–optimistic: Jefferson time-warp•station can fire when it has an incoming message on any edge•requires roll-back if speculative conflict is detected25AB4610Remarks on algorithms•Diverse algorithms and data structures•Exploiting parallelism in irregular algorithms is very complex–Miller et al. DMR implementation: interference graph + maximal independent sets–Jefferson Timewarp algorithm for event-driven simulation•Algorithms:–parallelism can be very input-dependent •DMR, event-driven simulation, graph reduction,….–don’t-care non-determinism•has nothing to do with concurrency•DMR, graph reduction–activities created dynamically may interfere with existing activities•event-driven simulation…•Data structures:–relatively few algorithms use dense arrays–more common: graphs, trees, lists, priority queues,…11Organization of talk•Seemingly unrelated parallel algorithms and data structures–Stencil codes–Delaunay mesh refinement–Event-driven simulation–Graph reduction of functional languages–………•Unifying abstractions–Amorphous data-parallelism–Baseline parallel implementation for exploiting amorphous data-parallelism•Specialized implementations that exploit structure–Structure of algorithms–Optimized compiler and runtime system support for different kinds of structure•Ongoing work12Requirements•Provide a model of parallelism in irregular algorithms•Unified treatment of parallelism in regular and irregular algorithms–parallelism in regular algorithms must emerge as a special case of general model–(cf.) correspondence principles in Physics•Abstractions should


View Full Document

UT CS 378 - Towards a Science of Parallel Programming

Documents in this Course
Epidemics

Epidemics

31 pages

Discourse

Discourse

13 pages

Phishing

Phishing

49 pages

Load more
Download Towards a Science of Parallel Programming
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Towards a Science of Parallel Programming and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Towards a Science of Parallel Programming 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?