DOC PREVIEW
Berkeley COMPSCI C267 - Shared Memory Programming OpenMP and Threads

This preview shows page 1-2-3-4-25-26-27-51-52-53-54 out of 54 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Shared Memory Programming OpenMP and ThreadsOutlineParallel Programming with ThreadsShared Memory ProgrammingCommon Notions of Thread CreationOverview of POSIX ThreadsForking Posix ThreadsSimple Threading ExampleLoop Level ParallelismShared Data and ThreadsSetting Attribute ValuesRecall Data Race Example from Last TimeBasic Types of Synchronization: BarrierCreating and Initializing a BarrierBasic Types of Synchronization: MutexesMutexes in POSIX ThreadsIntroduction to OpenMPSummary of Programming with ThreadsParallel Programming in OpenMPA Programmer’s View of OpenMPMotivationMotivation – OpenMPSlide 23Programming Model – Concurrent LoopsProgramming Model – Loop SchedulingProgramming Model – Data SharingProgramming Model - SynchronizationMicrobenchmark: Grid RelaxationMicrobenchmark: Structured GridMicrobenchmark: OceanSlide 31Microbenchmark: GeneticTSPSlide 33Slide 34Slide 35EvaluationSpecOMP (2001)OpenMP SummaryMore InformationShared Memory Hardware and Memory ConsistencyBasic Shared Memory ArchitectureIntuitive Memory ModelSequential Consistency IntuitionMemory Consistency SemanticsIf Caches are Not “Coherent”Snoopy Cache-Coherence ProtocolsLimits of Bus-Based Shared MemorySample MachinesBasic Choices in Memory/Cache CoherenceSGI Altix 3000Cache Coherence and Sequential ConsistencyProgramming with Weaker Memory Models than SCSharing: A Performance ProblemWhat to Take Away?01/26/2006 CS267 Lecture 51Shared Memory ProgrammingOpenMP and ThreadsKathy [email protected] www.cs.berkeley.edu/~yelick/cs267_sp0701/26/2006 CS267 Lecture 52Outline•Parallel Programming with Threads •Parallel Programming with OpenMP•See http://www.nersc.gov/nusers/help/tutorials/openmp/•Slides on OpenMP derived from: U.Wisconsin tutorial, which in turn were from LLNL, NERSC, U. Minn, and OpenMP.orgg•Memory consistency: the dark side of shared memory•Hardware review and a few more details•What this means to shared memory programmers•Summary01/26/2006 CS267 Lecture 53Parallel Programming with Threads01/26/2006 CS267 Lecture 54Shared Memory ProgrammingSeveral Thread Libraries•PTHREADS is the POSIX Standard•Solaris threads are very similar•Relatively low level•Portable but possibly slow•OpenMP is newer standard•Support for scientific programming on shared memory•http://www.openMP.org•P4 (Parmacs) is an older portable package•Higher level than Pthreads•http://www.netlib.org/p4/index.html01/26/2006 CS267 Lecture 55Common Notions of Thread Creation•cobegin/coendcobegin job1(a1); job2(a2);coend•fork/jointid1 = fork(job1, a1);job2(a2);join tid1;•futurev = future(job1(a1));… = …v…;•Cobegin cleaner than fork, but fork is more general•Futures require some compiler (and likely hardware) support•Statements in block may run in parallel•cobegins may be nested•Scoped, so you cannot have a missing coend•Future expression evaluated in parallel•Attempt to use return value will wait•Forked procedure runs in parallel•Wait at join point if it’s not finished01/26/2006 CS267 Lecture 56Overview of POSIX Threads•POSIX: Portable Operating System Interface for UNIX•Interface to Operating System utilities•PThreads: The POSIX threading interface•System calls to create and synchronize threads•Should be relatively uniform across UNIX-like OS platforms•PThreads contain support for•Creating parallelism•Synchronizing•No explicit support for communication, because shared memory is implicit; a pointer to shared data is passed to a thread01/26/2006 CS267 Lecture 57Forking Posix Threads•thread_id is the thread id or handle (used to halt, etc.)•thread_attribute various attributes•standard default values obtained by passing a NULL pointer•thread_fun the function to be run (takes and returns void*)•fun_arg an argument can be passed to thread_fun when it starts•errorcode will be set nonzero if the create operation failsSignature: int pthread_create(pthread_t *, const pthread_attr_t *, void * (*)(void *), void *);Example call: errcode = pthread_create(&thread_id; &thread_attribute &thread_fun; &fun_arg);01/26/2006 CS267 Lecture 58Simple Threading Examplevoid* SayHello(void *foo) { printf( "Hello, world!\n" ); return NULL;}int main() { pthread_t threads[16]; int tn; for(tn=0; tn<16; tn++) { pthread_create(&threads[tn], NULL, SayHello, NULL); } for(tn=0; tn<16 ; tn++) { pthread_join(threads[tn], NULL); } return 0;}Compile using gcc –lpthreadSee Millennium/NERSC docs for paths/modules01/26/2006 CS267 Lecture 59Loop Level Parallelism•Many scientific application have parallelism in loops•With threads: … my_stuff [n][n]; for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) … pthread_create (update_cell, …, my_stuff[i][j]);•But overhead of thread creation is nontrivial•update_cell should have a significant amount of work •1/pth if possibleAlso need i & j01/26/2006 CS267 Lecture 510Shared Data and Threads•Variables declared outside of main are shared•Object allocated on the heap may be shared (if pointer is passed)•Variables on the stack are private: passing pointer to these around to other threads can cause problems•Often done by creating a large “thread data” struct•Passed into all threads as argument•Simple example: char *message = "Hello World!\n"; pthread_create( &thread1, NULL, (void*)&print_fun, (void*) message);01/26/2006 CS267 Lecture 511Setting Attribute Values•Once an initialized attribute object exists, changes can be made. For example:•To change the stack size for a thread to 8192 (before calling pthread_create), do this:•pthread_attr_setstacksize(&my_attributes, (size_t)8192);•To get the stack size, do this:•size_t my_stack_size;pthread_attr_getstacksize(&my_attributes, &my_stack_size);•Other attributes:•Detached state – set if no other thread will use pthread_join to wait for this thread (improves efficiency)•Guard size – use to protect against stack overfow•Inherit scheduling attributes (from creating thread) – or not•Scheduling parameter(s) – in particular, thread priority•Scheduling policy – FIFO or Round Robin•Contention scope – with what threads does this thread compete for a CPU•Stack address – explicitly dictate where the stack is located•Lazy stack allocation –


View Full Document

Berkeley COMPSCI C267 - Shared Memory Programming OpenMP and Threads

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Shared Memory Programming OpenMP and Threads
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Shared Memory Programming OpenMP and Threads and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Shared Memory Programming OpenMP and Threads 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?