Basic Techniques of Parallel Programming Fundamental or Common Examples Problems with a very large degree of data parallelism PP ch 3 Image Transformations Shifting Rotation Clipping etc Pixel level Image Processing PP ch 12 Data parallelism DOP scale well with size of problem Divide and conquer Problem Partitioning pp ch 4 Parallel Bucket Sort Numerical Integration Divide problem is into smaller parallel problems of the same type as the larger problem then combine results Trapezoidal method using static assignment Adaptive Quadrature using dynamic assignment Gravitational N Body Problem Barnes Hut Algorithm Pipelined Computation pp ch 5 To improve throughput of a number of instances of the same problem Pipelined Addition Pipelined Insertion Sort Pipelined Solution of A Set of Upper Triangular Linear Equations Parallel Programming PP book Chapters 3 7 12 EECC756 Shaaban 1 lec 7 Spring2008 4 17 2008 Basic Techniques of Parallel Programming Examples Synchronous Iteration Synchronous Parallelism Barriers Implementations 1 Counter Barrier Implementation 2 Tree Barrier Implementation PP ch 6 Similar to 2 d grid ocean example lecture 4 3 Butterfly Connection Pattern Message Passing Barrier Synchronous Iteration Program Example Iterative Solution of Linear Equations Jacobi iteration Dynamic Load Balancing PP ch 7 For problems with unpredictable computations tasks Centralized Dynamic Load Balancing Decentralized Dynamic Load Balancing Distributed Work Pool Using Divide And Conquer Distributed Work Pool With Local Queues In Slaves Termination Detection for Decentralized Dynamic Load Balancing Example Shortest Path Problem Moore s Algorithm EECC756 Shaaban 2 lec 7 Spring2008 4 17 2008 Also low level pixel based image processing Problems with large degree of data parallelism Example Image Transformations Common Pixel Level Image Transformations Shifting The coordinates of a two dimensional object shifted by x in the x direction and y in the y dimension are given by x x x y y y where x and y are the original and x and y are the new coordinates Scaling The coordinates of an object magnified by a factor Sx in the x direction and Sy in the y direction are given by x xSx y ySy where Sx and Sy are greater than 1 The object is reduced in size if Sx and Sy are between 0 and 1 The magnification or reduction need not be the same in both x and y directions Rotation The coordinates of an object rotated through an angle about the origin of the coordinate system are given by x x cos y sin DOP O n2 y x sin y cos Clipping Deletes from the displayed picture those points outside a defined rectangular area If the lowest values of x y in the area to be display are x1 y1 and the highest values of x y are xh yh then x1 x xh y1 y yh needs to be true for the point x y to be displayed otherwise x y is not displayed Parallel Programming book Chapter 3 Chapter 12 pixel based image processing EECC756 Shaaban 3 lec 7 Spring2008 4 17 2008 Block Assignment Domain decomposition partitioning used similar to 2 d grid ocean example 80x80 blocks Possible Static Image Partitionings n2 4n Computation Communication p p 4 p C to C n Pixel based image processing Example Sharpening Filter Strip Assignment X0 X1 X2 1 1 1 X3 X4 X5 1 8 1 X7 X8 1 1 1 X6 Weight or Filter coefficient Sharpening Filter Mask Updated X4 8X4 X0 X1 X2 X3 X5 X6 X7 X8 9 10x640 strips of image rows Communication 2n Computation n2 p c to c ratio 2p n Image size 640x480 To be copied into array map from image file To be processed by 48 Processes or Tasks More on pixel based image processing Parallel Programming book Chapters 12 EECC756 Shaaban 4 lec 7 Spring2008 4 17 2008 Message Passing Image Shift Pseudocode Example 48 10x640 strip partitions Master for i 0 i 8 i for each 48 processes for j 0 j 6 j p i 80 bit map starting coordinates q j 80 for i 0 i 80 i load coordinates into array x y for j 0 j 80 j x i p i y i q j z j 8 i process number Send send Pz x 0 y 0 x 1 y 1 x 6399 y 6399 Data send coords to slave for i 0 i 8 i for each 48 processes for j 0 j 6 j accept new coordinates z j 8 i process number recv Pz a 0 b 0 a 1 b 1 a 6399 b 6399 receive new coords for i 0 i 6400 i 2 update bit map map a i b i map x i y i Get Results Update coordinates EECC756 Shaaban 5 lec 7 Spring2008 4 17 2008 Message Passing Image Shift Pseudocode Example 48 10x640 strip partitions Slave process i recv Pmaster c 0 c 6400 i e Get pixel coordinates to work on from master process receive block of pixels to process for i 0 i 6400 i 2 transform pixels c i c i delta x shift in x direction Update points data parallel comp c i 1 c i 1 delta y shift in y direction send Pmaster c 0 c 6399 send transformed pixels to master Send results to master process Or other pixel based computation More on pixel based image processing Parallel Programming book Chapters 12 EECC756 Shaaban 6 lec 7 Spring2008 4 17 2008 Image Transformation Performance Analysis Suppose each pixel requires one computational step and there are n x n pixels If the transformations are done sequentially there would be n x n steps so that ts n2 and a time complexity of O n2 Suppose we have p processors The parallel implementation column row or square rectangular divides the region into groups of n2 p pixels The parallel computation time is given by n x n image tcomp n2 p Computation P number of processes which has a time complexity of O n2 p Before the computation starts the bit map must be sent to the processes If sending each group cannot be overlapped in time essentially we need to broadcast all pixels which may be most efficiently done with a single bcast routine The individual processes have to send back the transformed coordinates of their group of pixels requiring individual send s or a gather routine Hence the communication time is tcomm O n2 Communication So that the overall execution time is given by tp tcomp tcomm O n2 p O n2 C to C Ratio p Accounting for initial data distribution EECC756 Shaaban 7 lec 7 Spring2008 4 17 2008 Divide and Conquer Divide Problem Initial large Problem tree Construction Combine Results One of the most fundamental techniques in parallel programming The problem is simply divided into separate smaller subproblems usually of the same form as the larger problem and each part is computed separately Further divisions done by recursion Once the simple tasks are performed the results are combined leading to larger and fewer tasks M ary or M way Divide and conquer A task is divided into M parts at each
View Full Document
Unlocking...