Slide 1Before We Get Started…AcknowledgementsData vs. Task ParallelismObjectivesWork PlanOpenMP: Target HardwareOpenMP: What to ExpectOpenMP: What Is It?“pthreads”: An OpenMP Precursor“pthreads”: ExampleSlide 12“pthreads”: Moving Away…OpenMP Programming ModelOpenMP: 20+ Library RoutinesA Few Syntax Details to Get StartedWhy Compiler Directive and/or Pragmas?Work PlanParallel Region & Structured Blocks (C/C++)Example: Hello World on my MachineOpenMP: Important RemarkWork PlanWork Sharing“omp for” constructCombining ConstructsThe Private ClauseExample: Parallel MandelbrotSlide 28The schedule Clauseschedule Clause ExampleWork PlanFunction Level Parallelismomp sectionsFunctional Level Parallelism Using omp sectionsAdvantage of Parallel SectionsWork PlanNew Addition to OpenMPTasks: What Are They?Simple Task ExampleTask Construct – Explicit Task ViewWhy are tasks useful?Tasks: Synchronization IssuesTask Completion ExampleME964High Performance Computing for Engineering Applications“The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague.” Edsger W. Dijkstra© Dan Negrut, 2011ME964 UW-MadisonParallel Computing using OpenMP[Part 1 of 2]March 31, 2011Before We Get Started…Last timeLearn how to run an MPI executable on NewtonPoint-to-Point Communication with MPICollective Communication in MPITodayParallel Computing using OpenMP, part 1 of 2.Other issuesAssignment 7 was posted on the class website, due on April 7Class website includes link to the OpenMP 3.0 Application Programming Interfacehttp://www.openmp.org/mp-documents/spec30.pdf 2AcknowledgementsThe overwhelming majority of slides used for discussing OpenMP issues are from Intel’s library of presentations for promoting OpenMP The slides are used herein with permissionCredit is given where due by a “Credit: IOMPP” or “Includes material from IOMPP” message at the bottom of the slideIOMPP stands for “Intel OpenMP Presentation”3Data vs. Task ParallelismData parallelismYou have a large amount of data elements and each data element (or possibly a subset of elements) needs to be processed to produce a resultWhen this processing can be done in parallel, we have data parallelismExample: Adding two long arrays of doubles to produce yet another array of doublesTask parallelismYou have a collection of tasks that need to be completedIf these tasks can be performed in parallel you are faced with a task parallel jobExamples: Reading the newspaper, drinking coffee, and scratching your backThe breathing your lungs, beating of your heart, liver function, controlling the swallowing, etc.4ObjectivesUnderstand OpenMP at the level where you canImplement data parallelismImplement task parallelism5Credit: IOMPPWork PlanWhat is OpenMP?Parallel regionsWork sharingData environment SynchronizationAdvanced topics6Credit: IOMPPOpenMP: Target HardwareCUDA: targeted parallelism on the GPUMPI: targeted parallelism on a cluster (distributed computing)Note that MPI implementation can handle transparently a SMP architecture such as a workstation with two hexcore CPUs that use a large amount of shared memory OpenMP: targets parallelism on SMP architecturesHandy when You have a machine that has 12 cores, probably 24 if HTT is accounted forYou have a large amount of shared memory that is backed by a 64 bit OS7OpenMP: What to ExpectIf you have 12 cores available to you, it is *highly* unlikely to get a speedup of more than 12 (superlinear)Recall the trick that helped the GPU hide latencyOvercommitting the SPs and hiding memory access latency with warp executionThis mechanism of hiding latency by overcommitment does not *explicitly* exist for parallel computing under OpenMP beyond what’s offered by HTT8OpenMP: What Is It?Portable, shared-memory threading APIFortran, C, and C++Multi-vendor support for both Linux and WindowsStandardizes task & loop-level parallelismSupports coarse-grained parallelismCombines serial and parallel code in single sourceStandardizes ~ 20 years of compiler-directed threading experienceCurrent spec is OpenMP 3.0 http://www.openmp.org318 Pages 9Credit: IOMPP“pthreads”: An OpenMP PrecursorBefore there was OpenMP, a common approach to support parallel programming was by use of pthreads“pthread”: POSIX threadPOSIX: Portable Operating System Interface [for Unix]pthreadsAvailable originally under Unix and Linux Windows ports are also available some as open source projectsParallel programming with pthreads: relatively cumbersome, prone to mistakes, hard to maintain/scale/expandMoreover, not envisioned as a mechanism for writing scientific computing software10“pthreads”: Example11int main(int argc, char *argv[]) {parm *arg;pthread_t *threads;pthread_attr_t pthread_custom_attr;int n = atoi(argv[1]);threads = (pthread_t *) malloc(n * sizeof(*threads));pthread_attr_init(&pthread_custom_attr);barrier_init(&barrier1); /* setup barrier */finals = (double *) malloc(n * sizeof(double)); /* allocate space for final result */arg=(parm *)malloc(sizeof(parm)*n);for( int i = 0; i < n; i++) { /* Spawn thread */arg[i].id = i;arg[i].noproc = n;pthread_create(&threads[i], &pthread_custom_attr, cpi, (void *)(arg+i));}for( int i = 0; i < n; i++) /* Synchronize the completion of each thread. */pthread_join(threads[i], NULL);free(arg);return 0;}12#include <stdio.h>#include <math.h>#include <time.h>#include <sys/types.h>#include <pthread.h>#include <sys/time.h>#define SOLARIS 1#define ORIGIN 2#define OS SOLARIStypedef struct { int id; int noproc; int dim;} parm;typedef struct { int cur_count; pthread_mutex_t barrier_mutex; pthread_cond_t barrier_cond;} barrier_t;void barrier_init(barrier_t * mybarrier) { /* barrier */ /* must run before spawning the thread */ pthread_mutexattr_t attr;# if (OS==ORIGIN) pthread_mutexattr_setprotocol(&attr, PTHREAD_PRIO_INHERIT); pthread_mutexattr_setprioceiling(&attr, 0); pthread_mutex_init(&(mybarrier->barrier_mutex), &attr);# elif (OS==SOLARIS) pthread_mutex_init(&(mybarrier->barrier_mutex), NULL);# else# error "undefined OS"# endif pthread_cond_init(&(mybarrier->barrier_cond), NULL); mybarrier->cur_count = 0;}void
View Full Document