Unformatted text preview:

COSC 6374 Parallel Computation Introduction to OpenMP II Material is based on slides by Barbara Chapman UH and Tim Mattson Intel Edgar Gabriel Fall 2011 Edgar Gabriel Agenda Parallel computing threads and OpenMP The core elements of OpenMP Thread creation Workshare constructs Managing the data environment Synchronization The runtime library and environment variables Recapitulation Parallel Computation Edgar Gabriel 1 What are Threads Thread an independent flow of control Runtime entity created to execute sequence of instructions Threads require A program counter A register state An area in memory including a call stack A thread id A process is executed by one or more threads that share Address space Attributes such as UserID open files working directory etc Parallel Computation Edgar Gabriel OpenMP Provides thread programming model at a high level The user does not need to specify all the details Especially with respect to the assignment of work to threads Creation of threads User makes strategic decisions Compiler figures out details Parallel Computation Edgar Gabriel 2 OpenMP Overview How do threads interact OpenMP is a shared memory model Threads communicate by sharing variables Unintended sharing of data causes race conditions race condition when the program s outcome changes as the threads are scheduled differently To control race conditions Use synchronization to protect data conflicts Synchronization is expensive so Change how data is accessed to minimize the need for synchronization Parallel Computation Edgar Gabriel Syntax details Most of the constructs in OpenMP are compiler directives For C and C Directives are pragmas with the form pragma omp construct clause clause Include file include omp h For Fortran the directives are comments and take one of the forms Fixed form C OMP construct clause clause Free form but works for fixed form too OMP construct clause clause The OpenMP lib module use omp lib Parallel Computation Edgar Gabriel 3 Structured blocks C C Most OpenMP constructs apply to structured blocks Structured block a block with one point of entry at the top and one point of exit at the bottom The only branches allowed are STOP statements in Fortran and exit in C C In C C a block is a single statement or a group of statements between brackets pragma omp parallel id omp get thread num res id do work id pragma omp for for i 0 i N i res i big calc i A i B i res i Parallel Computation Edgar Gabriel Structured Block Boundaries pragma omp parallel int id omp get thread num more res id do big job id if conv res id goto more printf All done n pragma omp parallel int id omp get thread num more res id do big job id if conv res id goto done goto more done if really done goto more Parallel Computation A structured block Not a structured block Edgar Gabriel 4 Parallel Regions Threads are created using omp parallel pragma Each thread executes a copy of the code within the structured block How many threads are created Environment variable to set no of threads Runtime function System default used if no additional information given double A 1000 pragma omp parallel do some work printf all done n Parallel Computation Edgar Gabriel Parallel Regions Fork join model of OpenMP Threads are created at the beginning of a parallel region and destroyed at the end of the parallel region conceptually Sequential execution double A 1000 Parallel execution Sequential execution pooh 0 A pooh 1 A pooh 2 A pooh 3 A printf all done n Parallel Computation Edgar Gabriel Threads wait here for all threads to finish before proceeding i e a barrier 5 A multi threaded Hello world program Starting point sequential hello world int main int argc char argv int ID 0 printf hello d ID printf world d n ID Parallel Computation Edgar Gabriel A multi threaded Hello world program include omp h int main int argc char argv pragma omp parallel int ID omp get thread num printf hello d ID printf world d n ID return 0 Sample output hello 1 hello 0 world 1 world 0 hello 3 hello 2 world 3 world 2 Parallel Computation Edgar Gabriel 6 OpenMP Library routines Modify Check the number of threads omp set num threads omp get num threads omp get thread num omp get max threads Are we in a parallel region omp in parallel How many processors in the system omp num procs Parallel Computation Edgar Gabriel Example vector add operation Sequential code OpenMP parallel version for i 0 i N i a i a i b i pragma omp parallel int id omp get thread num int Nthrds omp get num threads int istart id N Nthrds int iend id 1 N Nthrds for i istart i iend i a i a i b i Parallel Computation Edgar Gabriel 7 Example vector add operation All variables declared inside of the parallel region are considered to be private to each thread Each thread has its own copy of the variable Variables declared outside of a parallel region are shared amongst threads Unless explicitly changed by the user If istart and iend are not coordinated cautiously it can lead to cache coherence problems e g a iend on thread with id x and a istart on the thread with id x 1 are in the same cache line Parallel Computation Edgar Gabriel OpenMP work sharing constructs The for work sharing construct splits up loop iterations among the threads pragma omp parallel pragma omp for for i 0 i N i neat stuff i By default there is a barrier at the end of the omp for Use the nowait clause to turn off the barrier e g pragma omp for nowait nowait is useful between two consecutive independent omp for loops Parallel Computation Edgar Gabriel 8 Example vector add operation pragma omp parallel pragma omp for for i 0 i N i a i a i b i Much simpler code than the previous OpenMP version Loop variable i is automatically declared to be private on each thread Does not define how the loop iterations are distributed among the threads Can use the schedule clause to influence the work distribution e g pragma omp for schedule static Parallel Computation Edgar Gabriel OpenMP for construct The schedule clause affects how loop iterations are mapped onto threads schedule static chunk Deal out blocks of iterations of size chunk to each thread schedule dynamic chunk Each thread grabs chunk iterations off a queue until all iterations have been handled schedule guided chunk Threads dynamically grab blocks of iterations The size of the block starts large and shrinks down to size chunk as the calculation proceeds schedule runtime Schedule and chunk size taken from the MP SCHEDULE environment variable Parallel Computation


View Full Document

UH COSC 6374 - Introduction to OpenMP (II)

Documents in this Course
Load more
Loading Unlocking...
Login

Join to view Introduction to OpenMP (II) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to OpenMP (II) and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?