DOC PREVIEW
UW-Madison ME 964 - Lecture Notes

This preview shows page 1-2-3-20-21-22-41-42-43 out of 43 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

ME964High Performance Computing for Engineering Applications“The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague.” Edsger W. Dijkstra© Dan Negrut, 2011ME964 UW-MadisonParallel Computing using OpenMP[Part 1 of 2]March 31, 2011Before We Get Started… Last time Learn how to run an MPI executable on Newton Point-to-Point Communication with MPI Collective Communication in MPI Today Parallel Computing using OpenMP, part 1 of 2. Other issues Assignment 7 was posted on the class website, due on April 7 Class website includes link to the OpenMP 3.0 Application Programming Interface http://www.openmp.org/mp-documents/spec30.pdf2Acknowledgements The overwhelming majority of slides used for discussing OpenMP issues are from Intel’s library of presentations for promoting OpenMP  The slides are used herein with permission Credit is given where due by a “Credit: IOMPP” or “Includes material from IOMPP” message at the bottom of the slide IOMPP stands for “Intel OpenMP Presentation”3Data vs. Task Parallelism Data parallelism You have a large amount of data elements and each data element (or possibly a subset of elements) needs to be processed to produce a result When this processing can be done in parallel, we have data parallelism Example:  Adding two long arrays of doubles to produce yet another array of doubles Task parallelism You have a collection of tasks that need to be completed If these tasks can be performed in parallel you are faced with a task parallel job Examples:  Reading the newspaper, drinking coffee, and scratching your back The breathing your lungs, beating of your heart, liver function, controlling the swallowing, etc.4Objectives Understand OpenMP at the level where you can Implement data parallelism Implement task parallelism5Credit: IOMPPWork Plan What is OpenMP?Parallel regionsWork sharingData environment Synchronization Advanced topics6Credit: IOMPPOpenMP: Target Hardware CUDA: targeted parallelism on the GPU MPI: targeted parallelism on a cluster (distributed computing) Note that MPI implementation can handle transparently a SMP architecture such as a workstation with two hexcore CPUs that use a large amount of shared memory  OpenMP: targets parallelism on SMP architectures Handy when  You have a machine that has 12 cores, probably 24 if HTT is accounted for You have a large amount of shared memory that is backed by a 64 bit OS7OpenMP: What to Expect If you have 12 cores available to you, it is *highly* unlikely to get a speedup of more than 12 (superlinear) Recall the trick that helped the GPU hide latency Overcommitting the SPs and hiding memory access latency with warp execution This mechanism of hiding latency by overcommitmentdoes not *explicitly* exist for parallel computing under OpenMP beyond what’s offered by HTT8OpenMP: What Is It? Portable, shared-memory threading API– Fortran, C, and C++– Multi-vendor support for both Linux and Windows Standardizes task & loop-level parallelism Supports coarse-grained parallelism Combines serial and parallel code in single source Standardizes ~ 20 years of compiler-directed threading experience Current spec is OpenMP 3.0  http://www.openmp.org 318 Pages 9Credit: IOMPP“pthreads”: An OpenMP Precursor Before there was OpenMP, a common approach to support parallel programming was by use of pthreads “pthread”: POSIX thread POSIX: Portable Operating System Interface [for Unix] pthreads Available originally under Unix and Linux  Windows ports are also available some as open source projects Parallel programming with pthreads: relatively cumbersome, prone to mistakes, hard to maintain/scale/expand Moreover, not envisioned as a mechanism for writing scientific computing software10“pthreads”: Example11int main(int argc, char *argv[]) {parm *arg;pthread_t *threads;pthread_attr_t pthread_custom_attr;int n = atoi(argv[1]);threads = (pthread_t *) malloc(n * sizeof(*threads));pthread_attr_init(&pthread_custom_attr);barrier_init(&barrier1); /* setup barrier */finals = (double *) malloc(n * sizeof(double)); /* allocate space for final result */arg=(parm *)malloc(sizeof(parm)*n);for( int i = 0; i < n; i++) { /* Spawn thread */arg[i].id = i;arg[i].noproc = n;pthread_create(&threads[i], &pthread_custom_attr, cpi, (void *)(arg+i));}for( int i = 0; i < n; i++) /* Synchronize the completion of each thread. */pthread_join(threads[i], NULL);free(arg);return 0;}12#include <stdio.h>#include <math.h>#include <time.h>#include <sys/types.h>#include <pthread.h>#include <sys/time.h>#define SOLARIS 1#define ORIGIN 2#define OS SOLARIStypedef struct {int id;int noproc;int dim;} parm;typedef struct {int cur_count;pthread_mutex_t barrier_mutex;pthread_cond_t barrier_cond;} barrier_t;void barrier_init(barrier_t * mybarrier) { /* barrier *//* must run before spawning the thread */pthread_mutexattr_t attr;# if (OS==ORIGIN)pthread_mutexattr_setprotocol(&attr, PTHREAD_PRIO_INHERIT);pthread_mutexattr_setprioceiling(&attr, 0); pthread_mutex_init(&(mybarrier->barrier_mutex), &attr);# elif (OS==SOLARIS)pthread_mutex_init(&(mybarrier->barrier_mutex), NULL);# else# error "undefined OS"# endifpthread_cond_init(&(mybarrier->barrier_cond), NULL);mybarrier->cur_count = 0;}void barrier(int numproc, barrier_t * mybarrier) {pthread_mutex_lock(&(mybarrier->barrier_mutex));mybarrier->cur_count++;if (mybarrier->cur_count!=numproc) {pthread_cond_wait(&(mybarrier->barrier_cond), &(mybarrier->barrier_mutex));}else {mybarrier->cur_count=0;pthread_cond_broadcast(&(mybarrier->barrier_cond));}pthread_mutex_unlock(&(mybarrier->barrier_mutex));}void* cpi(void *arg) {parm *p = (parm *) arg;int myid = p->id;int numprocs = p->noproc;double PI25DT = 3.141592653589793238462643;double mypi, pi, h, sum, x, a;double startwtime, endwtime;if (myid == 0) {startwtime = clock();}barrier(numprocs, &barrier1);if (rootn==0)finals[myid]=0;else {h = 1.0 / (double) rootn;sum = 0.0;for(int i = myid + 1; i <=rootn; i += numprocs) {x = h * ((double) i - 0.5);sum += f(x);}mypi = h * sum;}finals[myid] = mypi;barrier(numprocs, &barrier1);if (myid == 0){pi = 0.0;for(int i=0; i < numprocs; i++) pi += finals[i];endwtime = clock();printf("pi is approx


View Full Document

UW-Madison ME 964 - Lecture Notes

Documents in this Course
Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?