COSC 6374 Parallel Computation Parallel Design Patterns II Algorithm structure and Supporting Structures Edgar Gabriel Fall 2010 Edgar Gabriel Parallelization Strategy Finding Concurrency Structure the problem to expose exploitable concurrency Algorithm Structure Structure the algorithm to take advantage of concurrency Supporting Structure Implementation Mechanism Intermediate stage between Algorithm Structure and Implementation program structuring definition of shared data structures Mapping of the higher level patterns onto a programming environment COSC 6374 Parallel Computation Edgar Gabriel 1 Finding concurrency Result A task decomposition that identifies tasks that can execute concurrently A data decomposition that identifies data local to each task A way of grouping tasks and ordering them according to temporal constraints COSC 6374 Parallel Computation Edgar Gabriel Algorithm structure Organize by tasks Task Parallelism Divide and Conquer Organize by data decomposition Organize by flow of data Geometric decomposition Pipeline Recursive data Event based coordination COSC 6374 Parallel Computation Edgar Gabriel 2 Task parallelism I Problem can be decomposed into a collection of tasks that can execute concurrently Tasks can be completely independent embarrassingly parallel or can have dependencies among them All tasks might be known at the beginning or might be generated dynamically COSC 6374 Parallel Computation Edgar Gabriel Task parallelism II Tasks There should be at least as many tasks as UEs typically many many more Computation associated with each task should be large enough to offset the overhead associated with managing tasks and handling dependencies Dependencies Ordering constraints sequential composition of taskparallel computations Shared data dependencies several tasks have to access the same data structure COSC 6374 Parallel Computation Edgar Gabriel 3 Shared data dependencies Shared data dependencies can be categorized as Removable dependencies an apparent dependency that can be removed by code transformation int i ii 0 jj 0 for i 0 i N i ii ii 1 d ii big time consuming work ii jj jj ii a jj other big time consuming work jj for i 0 i N i d i big time consuming work i a i i i 2 other big time consuming work i i i 2 6374 Parallel Computation COSC Edgar Gabriel 7 Shared data dependencies II Separable dependencies replicate the shared data structure and combine the copies into a single structure at the end Remember the matrix vector multiply using columnwise block distribution in the first MPI lecture Other dependencies non resolvable have to be followed COSC 6374 Parallel Computation Edgar Gabriel 4 Task scheduling Schedule the way in which tasks are assigned to UEs for execution Goal load balance minimize the overall execution of all tasks Two classes of schedule Static schedule distribution of tasks to UEs is determined at the start of the computation and not changed anymore Dynamic schedule the distribution of tasks to UEs changes as the computation proceeds COSC 6374 Parallel Computation Edgar Gabriel Task scheduling example Independent tasks B A F D A E Poor mapping to 4 UEs B C C F Good mapping to 4 UEs D B A D C E F E COSC 6374 Parallel Computation Edgar Gabriel 5 Static schedule Tasks are associated into blocks Blocks are assigned to UEs Each UE should take approximately same amount of time to complete task Static schedule usually used when Availability of computational resources is predictable e g dedicated usage of nodes UEs are identical e g homogeneous parallel computer Size of each task is nearly identical COSC 6374 Parallel Computation Edgar Gabriel Dynamic scheduling Used when Effort associated with each task varies widely is unpredictable Capabilities of UEs vary widely heterogeneous parallel machine Common implementations usage of task queues if a UE finishes current task it removes the next task from the task queue Work stealing each UE has its own work queue once its queue is empty a UE steals work from the task queue of another UE COSC 6374 Parallel Computation Edgar Gabriel 6 Dynamic scheduling Trade offs Fine grained shorter smaller tasks allow for better load balance Fine grained task have higher costs for task management and dependency management COSC 6374 Parallel Computation Edgar Gabriel Divide and Conquer algorithms split split split Base solve Base solve Merge Merge Merge COSC 6374 Parallel Computation Edgar Gabriel 7 Divide and Conquer A problem is split into a number of smaller subproblems Each sub problem is solved independently Sub solutions of each sub problem will be merged to the solution of the final problem Problems of Divide and Conquer for Parallel Computing Amount of exploitable concurrency decreases over the lifetime Trivial parallel implementation each function call to solve is a task on its own For small problems no new task should be generated but the baseSolve should be applied COSC 6374 Parallel Computation Edgar Gabriel Divide and Conquer Implementation On shared memory machines a divide and conquer algorithm can easily be mapped to a fork join model A new task is forked created After this task is done it joins the original task destroyed On distributed memory machines task queues Often implemented using the Master Worker framework discussed later COSC 6374 Parallel Computation Edgar Gabriel 8 Divide and Conquer int solve Problem P int solution Check whether we can further partition the problem if baseCase P solution baseSolve P No we can t else yes we can Problem subproblems N int subsolutions N subproblems split P Partition the problem for i 0 i N i subsolutions i solve subproblems i solution merge subsolutions return solution 17 COSC 6374 Parallel Computation Edgar Gabriel Task Parallelism using Master Worker framework Master Process Result queue Worker Process 1 Worker Process 2 Task queue COSC 6374 Parallel Computation Edgar Gabriel 9 Task Parallelism using work stealing Worker Process 1 Worker Process 2 COSC 6374 Parallel Computation Edgar Gabriel Geometric decomposition For all applications relying on data decomposition All processes should apply the same operations on different data items Key elements Data decomposition Exchange and update operation Data distribution and task scheduling COSC 6374 Parallel Computation Edgar Gabriel 10 2 D Example Laplace equation Parallel domain decomposition Data exchange at process boundaries required Halo cells Ghost cells Copy of the last row column of data from the
View Full Document
Unlocking...