DOC PREVIEW
RIT EECC 756 - Parallelization

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Parallelization of An Example ProgramGrid Solver ExampleSlide 3DecompositionExploit Application KnowledgeDecomposition OnlyAssignmentData Parallel SolverShared Address Space SolverSlide 10Notes on SAS ProgramNeed for Mutual ExclusionMutual ExclusionGlobal Event SynchronizationPointt-to-point Event Synch (Not Used Here)Group Event SynchronizationMessage Passing Grid SolverSlide 18Notes on Message Passing ProgramSend and Receive AlternativesOrchestration: SummaryCorrectness in Grid Solver Program1Parallelization of An Example ProgramExamine a simplified version of a piece of Ocean simulation•Iterative equation solverIllustrate parallel program in low-level parallel language•C-like pseudocode with simple extensions for parallelism•Expose basic communication and synch. primitives that must be supported2Grid Solver Example•Simplified version of solver in Ocean simulation•Gauss-Seidel (near-neighbor) sweeps to convergence–interior n-by-n points of (n+2)-by-(n+2) updated in each sweep–updates done in-place in grid, and diff. from prev. value computed–accumulate partial diffs into global diff at end of every sweep–check if error has converged (to within a tolerance parameter)–if so, exit solver; if not, do another sweepA[i,j ] = 0.2  (A[i,j] + A[i,j – 1] + A[i – 1, j] +A[i,j + 1] + A[i + 1, j])Expression for updating each interior point:31. i n t n ;/*size of matrix: (n + 2-by-n + 2) elements*/2. fl o a t * * A , d i ff = 0 ;3. m a i n ( )4. b e g i n5. r e a d ( n ) ; /*read input parameter: matrix size*/6. A  m a l l o c ( a 2 - d a r r a y o f s i z e n + 2 b y n + 2 d o u b l e s ) ;7. i n i t i a l i z e ( A ) ;/*initialize the matrix A somehow*/8. S o l v e ( A ) ;/*call the routine to solve equation*/9. e n d m a i n10.p r o c e d u r e S o l v e ( A )/*solve the equation system*/11.fl o a t * * A ; /*A is an (n + 2)-by-(n + 2) array*/12.b e g i n13.i n t i , j , d o n e = 0 ;14.fl o a t d i ff = 0 , t e m p ;15.w h i l e ( ! d o n e ) d o/*outermost loop over sweeps*/16.d i ff = 0 ; /*initialize maximum difference to 0*/17.f o r i  1 t o n d o/*sweep over nonborder points of grid*/18.f o r j  1 t o n d o19.t e m p = A [ i , j ] ;/*save old value of element*/20.A [ i , j ]  0 . 2 * ( A [ i , j ] + A [ i , j - 1 ] + A [ i - 1 , j ] +21.A [ i , j + 1 ] + A [ i + 1 , j ] ) ;/*compute average*/22.d i ff + = a b s ( A [ i , j ] - t e m p ) ;23.e n d f o r24.e n d f o r25.i f ( d i ff / ( n * n ) < T O L ) t h e n d o n e = 1 ;26.e n d w h i l e27.e n d p r o c e d u r e4Decomposition•Concurrency O(n) along anti-diagonals, serialization O(n) along diag.•Retain loop structure, use pt-to-pt synch; Problem: too many synch ops.•Restructure loops, use global synch; imbalance and too much synch•Simple way to identify concurrency is to look at loop iterations–dependence analysis; if not enough concurrency, then look further•Not much concurrency here at this level (all loops sequential)•Examine fundamental dependences, ignoring loop structure5Exploit Application Knowledge•Different ordering of updates: may converge quicker or slower •Red sweep and black sweep are each fully parallel: •Global synch between them (conservative but convenient)•Ocean uses red-black; we use simpler, asynchronous one to illustrate–no red-black, simply ignore dependences within sweep–sequential order same as original, parallel program nondeterministicRed pointBlack point•Reorder grid traversal: red-black ordering6Decomposition Only•Decomposition into elements: degree of concurrency n2•To decompose into rows, make line 18 loop sequential; degree n•for_all leaves assignment left to system–but implicit global synch. at end of for_all loop15. while (!done) do/*a sequential loop*/16. diff = 0; 17. for_all i  1 to n do/*a parallel loop nest*/18. for_all j  1 to n do19. temp = A[i,j];20. A[i,j]  0.2 * (A[i,j] + A[i,j-1] + A[i-1,j] +21. A[i,j+1] + A[i+1,j]);22. diff += abs(A[i,j] - temp);23. end for_all24. end for_all25. if (diff /(n*n) < TOL) then done = 1;26. end while7Assignment•Dynamic assignment–get a row index, work on the row, get a new row, and so on•Static assignment into rows reduces concurrency (from n to p)–block assign. reduces communication by keeping adjacent rows together•Let’s dig into orchestration under three programming modelsipP0P1P2P4•Static assignments (given decomposition into rows)–block assignment of rows: Row i is assigned to process –cyclic assignment of rows: process i is assigned rows i, i+p, and so on8Data Parallel Solver1. int n, nprocs;/*grid size (n + 2-by-n + 2) and number of processes*/2. fl oat **A, diff = 0;3. main()4. begin5. read(n); read(nprocs);; /*read input grid size and number of processes*/6. A  G_MALLOC (a 2-d array of size n+2 by n+2 doubles);7. initialize(A);/*initialize the matrix A somehow*/8. Solve (A);/*call the routine to solve equation*/9. end main10. procedure Solve(A)/*solve the equation system*/11. fl oat **A; /*A is an (n + 2-by-n + 2) array*/12. begin13. int i, j, done = 0;14. fl oat mydif = 0, temp;14a.DECOMP A[BLOCK,*, nprocs];15. while (!done) do/*outermost loop over sweeps*/16.mydif = 0; /*initialize maximum difference to 0*/17.for_all i  1 to n do/*sweep over non-border points of grid*/18.for_all j  1 to n do19. temp = A[i,j];/*save old value of element*/20. A[i,j]  0.2 * (A[i,j] + A[i,j-1] + A[i-1,j] +21. A[i,j+1] + A[i+1,j]);/*compute average*/22. mydiff += abs(A[i,j] - temp);23.end for_all24.end for_all24a.REDUCE (mydif , dif , ADD);25. if (diff /(n*n) < TOL) then done = 1;26. end while27. end procedure9Shared Address Space Solver•Assignment controlled by values of variables used as loop boundsSingle Program Multiple Data (SPMD)SweepTest ConvergenceProcessesSolve Solve Solve Solve101. int n, nprocs; /*matrix dimension and number of processors to be used*/2a. fl oat **A, diff ;/*A is global (shared) array representing the grid*//*diff is global (shared) maximum difference in currentsweep*/2b.LOCKDEC(dif _lock);/*declaration of lock to enforce mutual exclusion*/2c.BARDEC (bar1);/*barrier declaration for global synchronization betweensweeps*/3. main()4. begin5. read(n); read(nprocs);/*read input matrix size and number of processes*/6. A  G_MALLOC (a


View Full Document

RIT EECC 756 - Parallelization

Documents in this Course
Load more
Download Parallelization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Parallelization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parallelization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?