Unformatted text preview:

Dr. Rodric Rabbah, IBM. 1 6.189 IAP 2007 MIT6.189 IAP 2007Lecture7Design Patterns for Parallel Programming II2 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Recap: Common Steps to ParallelizationP0TasksExecutionUnitsProcessorsP1P2P3p0p1p2p3p0p1p2p3PartitioningSequentialcomputationParallelProgramdecompositionassignmentorchestrationmapping3 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.joinIDCTIQuantizationsplitVLDmacroblocks, motion vectorsfrequency encodedmacroblocksdifferentially coded motion vectorsmotion vectorsspatially encoded macroblocksrecovered pictureZigZagSaturationMPEG bit streamMPEG DecoderMotion CompensationPicture ReorderColor ConversionDisplayMotion Vector DecodeRepeatRecap: Decomposing for Concurrency● Task decomposition Parallelism in the application ● Data decomposition Same computation many data● Pipeline decomposition Data assembly lines  Producer-consumer chains4 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Dependence Analysis● Given two tasks how to determine if they can safely run in parallel?5 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Bernstein’s Condition● Ri: set of memory locations read (input) by task Ti● Wj: set of memory locations written (output) by task Tj● Two tasks T1and T2are parallel if  input to T1is not part of output from T2 input to T2is not part of output from T1 outputs from T1and T2do not overlap6 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.T1a = x + y T2b = x + zExampleR1= { x, y }W1= { a }R2= { x, z }W2= { b }φφφ===211221 WWWRWRIII7 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Patterns for Parallelizing ProgramsAlgorithm Expression● Finding Concurrency Expose concurrent tasks● Algorithm Structure Map tasks to units of execution to exploit parallel architectureSoftware Construction● Supporting Structures Code and data structuring patterns● Implementation Mechanisms Low level mechanisms used to write parallel programs4 Design SpacesPatterns for Parallel Programming. Mattson, Sanders, and Massingill(2005).8 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Algorithm Structure Design Space● Given a collection of concurrent tasks, what’s the next step?● Map tasks to units of execution (e.g., threads)● Important considerations Magnitude of number of execution units platform will support Cost of sharing information among execution units Avoid tendency to over constrain the implementation– Work well on the intended platform– Flexible enough to easily adapt to different architectures9 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Major Organizing Principle● How to determine the algorithm structure that represents the mapping of tasks to units of execution?● Concurrency usually implies major organizing principle Organize by tasks Organize by data decomposition Organize by flow of data10 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Organize by Tasks?Recursive?Task ParallelismDivide and Conqueryesno11 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Task Parallelism● Ray tracing Computation for each ray is a separate and independent ● Molecular dynamics Non-bonded force calculations, some dependencies● Common factors Tasks are associated with iterations of a loop Tasks largely known at the start of the computation All tasks may not need to complete to arrive at a solution12 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Divide and Conquer● For recursive programs: divide and conquer Subproblems may not be uniform May require dynamic load balancingproblemsubproblemsubproblemcomputesubproblemcomputesubproblemcomputesubproblemcomputesubproblemsubproblemsubproblemsolutionjoinjoinjoinsplitsplitsplit13 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Organize by Data?Recursive?GeometricDecompositionRecursive Data● Operations on a central data structure Arrays and linear data structures Recursive data structuresyesno14 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Geometric Decomposition● Gravitational body simulator Calculate force between pairs of objects and update accelerationsVEC3D acc[NUM_BODIES] = 0;for (i = 0; i < NUM_BODIES - 1; i++) {for (j = i + 1; j < NUM_BODIES; j++) {// Displacement vectorVEC3D d = pos[j] – pos[i];// Forcet = 1 / sqr(length(d));// Components of force along displacementd = t * (d / length(d));acc[i] += d * mass[j];acc[j] += -d * mass[i];}}posvelpos15 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Recursive Data● Computation on a list, tree, or graph Often appears the only way to solve a problem is to sequentially move through the data structure● There are however opportunities to reshape the operations in a way that exposes concurrency16 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Recursive Data Example: Find the Root4321 65 74321 65 74321 65 7Step 1Step 2Step 3● Given a forest of rooted directed trees, for each node, find the root of the tree containing the node Parallel approach: for each node, find its successor’s successor, repeat until no changes– O(log n) vs. O(n)17 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Work vs. Concurrency Tradeoff● Parallel restructuring of find the root algorithm leads to O(n log n) work vs. O(n) with sequential approach● Most strategies based on this pattern similarly trade off increase in total work for decrease in execution time due to concurrency18 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Organize by Flow of Data?Regular?Event-based CoordinationPipeline● In some application domains, the flow of data imposes ordering on the tasks Regular, one-way, mostly stable data flow Irregular, dynamic, or unpredictable data flowyesno19 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Pipeline Throughput vs. Latency● Amount of concurrency in a pipeline is limited by the number of stages● Works best if the time to fill and drain the pipeline is small compared to overall running time● Performance metric is usually the throughput Rate at which data appear at the end of the pipeline per time unit (e.g., frames per second)● Pipeline latency is important for real-time applications Time interval from data input to pipeline, to data output20 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.Event-Based Coordination● In this pattern, interaction of tasks to process data can vary over unpredictable intervals● Deadlocks are likely for applications that use this patternDr. Rodric Rabbah, IBM. 21 6.189 IAP 2007 MIT6.189 IAP 2007Supporting Structures SPMD Loop parallelism Master/Worker Fork/Join22 6.189 IAP 2007


View Full Document

MIT 6 189 - Design Patterns for Parallel Programming II

Download Design Patterns for Parallel Programming II
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Design Patterns for Parallel Programming II and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Design Patterns for Parallel Programming II 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?