UW-Madison ME 964 - Parallel Programming Patterns Discussion of Midterm Project - D672973

Home> Schools> University of Wisconsin, Madison> Mechanical Engineering (ME) > ME 964> Parallel Programming Patterns Discussion of Midterm Project

DOC PREVIEW

UW-Madison ME 964 - Parallel Programming Patterns Discussion of Midterm Project

School name University of Wisconsin, Madison

Course Me 964- High Performance Computing for Engineering Applications

Pages 48

This preview shows page 1-2-3-23-24-25-26-46-47-48 out of 48 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

ME964 High Performance Computing for Engineering ApplicationsBefore we get started…Key Parallel Programming StepsWhat’s Comes Next?Before We Dive In…Supporting Structures ~Program and Data Models~Program Structuring PatternsProgram Structuring Patterns (the rest of the pack)Review: Algorithm StructureAlgorithm Structures vs. Supporting StructuresSupporting Structures vs. Program ModelsMore on SPMDTypical SPMD Program PhasesCore Computation PhaseA Simple ExampleSPMD Code Version 1Problems with Version 1SPMD Code Version 2SPMD Code Version 3Comparing the Three VersionsData StylesSlide 22The Reminder of the SemesterMidterm ProjectFinal ProjectThe Final Project: Further CommentsThe Final Project: Concluding RemarksSlide 28The Collision Problem (The Midterm Project)Proposed WorkflowProject templateSpheres: Possible Contact CasesPossible Contact Cases (Contd.)Bullet Physics LibraryScreenshot, BulletRelevant VariablesVariable DescriptionThe Passed/Failed IssueCommon Sense AdviceCollision Detection ~ Some Possible Approaches ~ReferencesBrute ForceBaraff – Sorting axisExample: Why this is tricky at timesHarada , GPU Gems 3Harada (Contd.)Scott Le Grand, GPU Gems 3Scott Le Grand (Contd.)ME964High Performance Computing for Engineering ApplicationsParallel Programming PatternsDiscussion of Midterm ProjectOct. 21, 2008Before we get started…Last Time Parallel Programming patternsThe “Finding Concurrency Design Space”The “Algorithm Structure Design Space”TodayParallel Programming patternsDiscuss the “Supporting Structures Design Space”Discussion of the Midterm ProjectAssigned today, due on 11/18 or 11/09 (based on level of difficulty)Other issues:I will be in Milwaukee on Th, Mik will be the lecturer - tracing the evolution of the hardware support for the graphics pipeline 2Key Parallel Programming Steps1) Find the concurrency in the problem2) Structure the algorithm so that concurrency can be exploited3) Implement the algorithm in a suitable programming environment4) Execute and tune the performance of the code on a parallel system3What’s Comes Next?Focus on thisfor a while4Before We Dive In…You hover in this design space when you ponder whether an MPI or an SPMD or a event driven simulation environment is what it takes to optimally design your algorithmWe are not going to focus on this process too much since it’s clear that we are stuck with SPMDAll that we’ll be doing is to compare SPMD with other parallel programming paradigms to understand when SPMD is good and when it is badAlso, we will not cover the “Implementation Mechanisms” design space since we don’t need to wrestle with how to spawn and join threads, how to do synchronization, etc.This would be Chapter 6 of the book5Supporting Structures ~Program and Data Models~Fork/JoinMaster/WorkerSPMDProgram ModelsLoop ParallelismDistributed ArrayShared QueueShared DataData ModelsThese are not necessarily mutually exclusive.6Program Structuring PatternsSPMD (Single Program, Multiple Data)All PE’s (Processor Elements) execute the same program in parallelEach PE has its own dataEach PE uses a unique ID to access its portion of dataDifferent PE can follow different paths through the same codeThis is essentially the CUDA Grid modelMaster/WorkerLoop ParallelismFork/Join7Program Structuring Patterns(the rest of the pack)Master/WorkerA Master thread sets up a pool of worker threads and a bag of tasksWorkers execute concurrently, removing tasks until doneLoop ParallelismLoop iterations execute in parallelFORTRAN do-all (truly parallel), do-across (with dependence)Very common in OpenMPFork/JoinMost general way of creation of threads (the POSIX standard)Can be regarded as a very low level approach in which you use the OS to manage parallelism8Review: Algorithm StructureAlgorithm DesignOrganize by TaskOrganize by DataOrganize by Data FlowLinear Recursive Linear RecursiveTaskParallelismDivide andConquerGeometricDecompositionRecursiveDataRegular IrregularPipeline Event Driven9Algorithm Structures vs. Supporting StructuresTask Parallel. Divide/ConquerGeometric Decomp.Recursive DataPipeline Event-basedSPMD☺☺☺☺ ☺☺☺ ☺☺☺☺ ☺☺ ☺☺☺ ☺☺Loop Parallel☺☺☺☺ ☺☺ ☺☺☺Master/Worker☺☺☺☺ ☺☺ ☺ ☺ ☺ ☺Fork/Join☺☺ ☺☺☺☺ ☺☺ ☺☺☺☺ ☺☺☺☺Four smilies is the best (Source: Mattson, et al.) 10Supporting Structures vs. Program ModelsOpenMP MPI CUDASPMD ☺☺☺ ☺☺☺☺ ☺☺☺☺Loop Parallel☺☺☺☺ ☺Master/Slave☺☺ ☺☺☺Fork/Join ☺☺☺11More on SPMDDominant coding style of scalable parallel computingMPI code is mostly developed in SPMD styleMany OpenMP code is also in SPMD (next to loop parallelism)Particularly suitable for algorithms based on data parallelism, geometric decomposition, divide and conquer.Main advantageTasks and their interactions visible in one piece of source code, no need to correlated multiple sourcesSPMD is by far the most commonly used pattern for structuring parallel programs. 12Typical SPMD Program PhasesInitializeEstablish localized data structure and communication channelsObtain a unique identifierEach thread acquires a unique identifier, typically in the range from 0 to N-1, where N is the number of threads.Both OpenMP and CUDA have built-in support for this.Distribute DataDecompose global data into chunks and localize them, orSharing/replicating major data structure using thread ID to associate subset of the data to threadsRun the core computationMore details in next slide…FinalizeReconcile global data structure, prepare for the next major iteration13Core Computation PhaseThread IDs are used to differentiate behavior of threadsUse thread ID in loop index calculations to split loop iterations among threadsUse conditions based on thread ID to branch to their specific actionsBoth can have very different performance results and code complexity depending on the way they are done.14A Simple ExampleAssumeThe computation being parallelized has 1,000,000 iterations.Sequential code:Num_steps = 1000000;for (i=0; i< num_steps, i++) {…}15SPMD Code Version 1Assign a chunk of iterations to each threadThe last thread also finishes up the remainder iterationsnum_steps = 1000000;i_start = my_id * (num_steps/num_threads);i_end =

View Full Document