Unformatted text preview:

1CMSC 714Lecture 7OpenMP and UPCAlan Sussman2CMSC 714, Fall05 - Alan Sussman & Jeffrey K. HollingsworthNotesz First programming assignment coming soon– Anyone still need an account?– Account problems?z More questions on PVM and/or MPI?23CMSC 714, Fall05 - Alan Sussman & Jeffrey K. HollingsworthOpenMPz Support Parallelism for SMPs– provide a simple portable model– allows both shared and private data– provides parallel do loopsz Includes – automatic support for fork/join parallelism– reduction variables– atomic statement• one processes executes at a time– single statement• only one process runs this code (first thread to reach it)4CMSC 714, Fall05 - Alan Sussman & Jeffrey K. HollingsworthOpenMPz Characteristics– Both local & shared memory (depending on directives)– Parallelism : directives for parallel loops, functions– Compilers convert programs into multi-threaded (i.e. pthreads)– Not available on clustersz Example#pragma omp parallel for private(i)for (i=0; i<NUPDATE; i++) {int ran = random();table[ ran & (TABSIZE-1) ] ^= stable[ ran >> (64-LSTSIZE) ];}35CMSC 714, Fall05 - Alan Sussman & Jeffrey K. HollingsworthMore on OpenMPz Characteristics– Not a full parallel language, but a language extension– A set of standard compiler directives and library routines – Used to create parallel Fortran, C and C++ programs– Usually used to parallelize loops– Standardizes last 15 years of SMP practicez Implementation– Compiler directives using #pragma omp <directive>– Parallelism can be specified for regions & loops– Data can be• Private – each processor has local copy• Shared – single copy for all processors6CMSC 714, Fall05 - Alan Sussman & Jeffrey K. HollingsworthOpenMP – Programming Modelz Fork-join parallelism (restricted form of MIMD)– Normally single thread of control (master)– Worker threads spawned when parallel region encountered– Barrier synchronization required at end of parallel regionMaster ThreadParallel Regions47CMSC 714, Fall05 - Alan Sussman & Jeffrey K. HollingsworthOpenMP – Example Parallel Regiondouble a[1000];omp_set_num_threads(4);#pragma omp parallel{int id = omp_thread_num();foo(id,a);}printf(“all done \n”);double a[1000];#pragma omp parallelfoo(3,a);printf(“all done \n”);foo(2,a);foo(1,a);foo(0,a);omp_set_num_threads(4);z Task level parallelism – #pragma omp parallel { … }OpenMP compiler8CMSC 714, Fall05 - Alan Sussman & Jeffrey K. HollingsworthOpenMP – Example Parallel Loop#pragma omp parallel{int id, i, nthreads,start, end;id = omp_get_thread_num();nthreads = omp_get_num_threads();start = id * N / nthreads ; // assigningend = (id+1) * N / nthreads ; // workfor (i=start; i<end; i++) {foo(i);}}#pragma omp parallel forfor (i=0;i<N;i++) {foo(i);}Loop level parallelism – #pragma omp parallel for- Loop iterations are assigned to threads, invoked as functionsOpenMP compiler59CMSC 714, Fall05 - Alan Sussman & Jeffrey K. HollingsworthSample Fortran77 OpenMP Codeprogram compute_piinteger n, idouble precision w, x, sum, pi, f, ac function to integratef(a) = 4.d0 / (1.d0 + a*a)print *, “Enter number of intervals: “read *,nc calculate the interval sizew = 1.0d0/nsum = 0.0d0!$OMP PARALLEL DO PRIVATE(x), SHARED(w)!$OMP& REDUCTION(+: sum)do i = 1, nx = w * (i - 0.5d0)sum = sum + f(x)enddopi = w * sumprint *, “computed pi = “, pistopend10CMSC 714, Fall05 - Alan Sussman & Jeffrey K. HollingsworthUPCz Extension to C for parallel computingz Target Environment– Distributed memory machines– Cache Coherent multi-processorsz Features– Explicit control of data distribution– Includes parallel for statement611CMSC 714, Fall05 - Alan Sussman & Jeffrey K. HollingsworthUPCz Characteristics– Local memory, shared arrays accessed by global pointers– Parallelism : single program on multiple nodes (SPMD) – Provides illusion of shared one-dimensional arrays– Features• Data distribution declarations for arrays• Cast global pointers to local pointers for efficiency• One-sided communication routines (memput / memget)– Compilers translate global pointers, generate communicationz Exampleshared int *x, *y, z[100];upc_forall (i = 0; i < 100; j++) { z[i] = *x++ × *y++; }12CMSC 714, Fall05 - Alan Sussman & Jeffrey K. HollingsworthUPC Execution Modelz SPMD-based– One thread per processor– Each thread starts with same entry to mainz Different consistency models possible– “strict” model is based on sequential consistency– “relaxed” based on release consistency713CMSC 714, Fall05 - Alan Sussman & Jeffrey K. HollingsworthForall Loopz Forms basis of parallelismz Add fourth parameter to for loop, “affinity”– Where code is executed is based on “affinity”z Lacks explicit barrier before/after execution– Differs from OpenMPz Supports nested forall loops14CMSC 714, Fall05 - Alan Sussman & Jeffrey K. HollingsworthSplit-phase Barriersz Traditional Barriers– Once enter barrier, busy-wait until all threads arrivez Split-phase– Announce intention to enter barrier (upc_notify)– Perform some local operations– Wait for other threads (upc_wait)z Advantage– Allows work while waiting for processes to arrivez Disadvantage– Must find work to do– Takes time to communicate both notify and


View Full Document

UMD CMSC 714 - Lecture 7 OpenMP and UPC

Documents in this Course
MTOOL

MTOOL

7 pages

BOINC

BOINC

21 pages

Eraser

Eraser

14 pages

Load more
Download Lecture 7 OpenMP and UPC
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 7 OpenMP and UPC and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 7 OpenMP and UPC 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?