Synchronous Iteration Iteration based computation is a powerful method for solving numerical and some non numerical problems For numerical problems a calculation is repeated and each time a result is obtained which is used on the next execution The process is repeated until the desired results are obtained Though iterative methods are is sequential in nature parallel implementation can be successfully employed when there are multiple independent instances of the iteration In some cases this is part of the problem specification and sometimes one must rearrange the problem to obtain multiple independent instances The term synchronous iteration is used to describe solving a problem by iteration where different tasks may be performing separate iterations but the iterations must be synchronized using point to point synchronization barriers or other synchronization mechanisms EECC756 Shaaban 1 lec 8 Spring2000 4 6 2000 Barriers A synchronization mechanism applicable to shared memory as well as message passing where each process must wait until all members of a specific process group reach a specific reference point in their computation Possible Implementations A library call possibly implemented using a counter Using individual point to point synchronization forming A tree Butterfly connection pattern EECC756 Shaaban 2 lec 8 Spring2000 4 6 2000 Processes Reaching A Barrier At Different Times EECC756 Shaaban 3 lec 8 Spring2000 4 6 2000 Message Passing Counter Implementation of Barriers If the master process maintains the barrier counter It counts the messages received from slave processes as they reach their barrier during arrival phase Release slaves processes during departure phase after all the processes have arrived for i 0 i n i count slaves as they reach their barrier recv Pany for i 0 i n i release slaves send Pi EECC756 Shaaban 4 lec 8 Spring2000 4 6 2000 Tree Barrier Implementation EECC756 Shaaban 5 lec 8 Spring2000 4 6 2000 Butterfly Connection Pattern Message Passing Barrier EECC756 Shaaban 6 lec 8 Spring2000 4 6 2000 Message Passing Local Synchronization Process Pi 1 Process Pi Process Pi 1 recv Pi send Pi send Pi 1 send Pi 1 recv Pi 1 recv pi 1 recv Pi send Pi EECC756 Shaaban 7 lec 8 Spring2000 4 6 2000 Synchronous Iteration Program Example Iterative Solution of Linear Equations Given n linear equations with n unknowns an 1 0 x0 an 1 1 x1 a n 1 2 x2 an 1 n 1xn 1 bn 1 a1 0 x0 a1 1 x1 a1 2 x2 a1 n 1x n 1 b1 a0 0 x0 a0 1 x1 a0 2 x2 a0 n 1 xn 1 b0 By rearranging the ith equation ai 0 x0 ai 1 x1 ai 2 x2 ai n 1 xn 1 bi to xi 1 a i i bi ai 0 x0 ai 1 x1 a i 2 x2 ai i 1 xi 1 ai i 1 xi 1 or ai n 1 xn 1 This equation can be used as an iteration formula for each of the unknowns to obtain a better approximation Jacobi Iteration All the values of x are updated at once EECC756 Shaaban 8 lec 8 Spring2000 4 6 2000 Iterative Solution of Linear Equations Jacobi Iteration Sequential Code Given the arrays a and b holding the constants in the equations x provided to hold the unknowns and a fixed number of iterations the code might look like for i 0 i n i x i b i initialize unknowns for iteration 0 iteration limit iteration for i 0 i n i sum 0 for j 0 j n j compute summation of a x if i j sum sum a i j x j new x i b i sum a i i Update unknown for i 0 i n i update values x i new x i EECC756 Shaaban 9 lec 8 Spring2000 4 6 2000 Iterative Solution of Linear Equations Jacobi Iteration Parallel Code In the sequential code the for loop is a natural barrier between iterations In parallel code we have to insert a specific barrier Also all the newly computed values of the unknowns need to be broadcast to all the other processes Process Pi could be of the form x i b i initialize values for iteration 0 iteration limit iteration sum a i i x i for j 1 j n j compute summation of a x sum sum a i j x j new x i b i sum a i i compute unknown broadcast receive new x i broadcast values global barrier wait for all processes The broadcast routine broadcast receive sends the newly computed value of x i from process i to other processes and collects data broadcast from other processes EECC756 Shaaban 10 lec 8 Spring2000 4 6 2000 Jacobi Iteration Analysis Sequential Time equals iteration time number of iterations O n2 for each iteration Parallel execution time is the time of one processor each operating over n p unknowns Computation for iterations Inner loop with n iterations outer loop with n p iterations Inner loop a multiplication and an addition Outer loop a multiplication and a subtraction before inner loop and a subtraction and division after inner loop tcomp n p 2n 4 Time complexity O n2 p Communication Occurs at the end of each iteration multiple broadcasts p broadcasts each of size n p require tdata to send each item tcomm p tstartup n p tdata ptstartup ntdata Overall Time tp n p 2n 4 ptstartup ntdata EECC756 Shaaban 11 lec 8 Spring2000 4 6 2000 Effects of Computation And Communication in Jacobi Iteration For one iteration tp n p 2n 4 ptstartup ntdata Given n tstartup 10000 tdata 50 integer n p Minimum execution time occurs when p 16 EECC756 Shaaban 12 lec 8 Spring2000 4 6 2000 Dynamic Load Balancing To achieve best performance of a parallel computing system running a parallel problem it s essential to maximize processor utilization by distributing the computation load evenly or balancing the load among the available processors Optimal static load balancing optimal mapping or scheduling is an intractable NP complete problem except for specific problems on specific networks Hence heuristics are usually used to select processors for processes Even the best static mapping may offer the best execution time due to changing conditions at runtime and the process may need to done dynamically The methods used for balancing the computational load dynamically among processors can be broadly classified as 1 Centralized dynamic load balancing 2 Decentralized dynamic load balancing EECC756 Shaaban 13 lec 8 Spring2000 4 6 2000 Processor Load Balance Performance EECC756 Shaaban 14 lec 8 Spring2000 4 6 2000 Centralized Dynamic Load Balancing Advantage of centralized approach for computation termination The master process terminates the computation when 1 The task queue is empty and 2 Every process has made a request for more tasks without any new tasks been generated EECC756 Shaaban 15 lec 8 Spring2000 4 6 2000 Decentralized Dynamic Load Balancing Distributed Work Pool Using Divide And Conquer EECC756 Shaaban 16 lec 8
View Full Document
Unlocking...