11 2 2010 Unified Parallel C Rakhi Anand Department of Computer Science University of Houston rakhi cs uh edu Saber Feki References Slides in this lecture are based upon following references http upc lbl gov lang overview shtml http upc gwu edu downloads Manual 1 2 pdf http upc lbl gov docs user index shtml http upc gwu edu tutorials UPC SC05 pdf Saber Feki 1 11 2 2010 Introduction Unified Parallel C A Partition Global Address Space language PGAS model Similar to C language Common and familiar syntax Designed for parallel C programs Provide the ability to exploit data locality in different memory architectures PGAS Languages UPC is based on Partition Global Address Space language PGAS model Global address space Thread0 Thread1 Threadn Shared memory space Private 0 Private 1 Private n 2 11 2 2010 UPC Execution model A number of threads work in SPMD fashion MYTHREAD gives thread index 0 1 n 1 THREADS gives number of threads UPC Execution model There are two compilation mode Static Threads modes Threads are specified at compile time Using THREADS constant Dynamic Thread mode Threads are specified at run time 3 11 2 2010 UPC Hello world example include upc h needed for UPC extensions include stdio h main printf Thread d of d hello UPC world n MYTHREAD THREADS Compile upcc T 2 o hello hello upc Run upcrun n 2 hello Shared and Private data Normal C varaibles are allocated in the private memory space of a thread int mine Shared varaibles are allocated only once with thread 0 shared int x Shared arrays are distributed across threads with one element per each thread shared int x 10 4 11 2 2010 Shared and Private data Global address space Thread0 Thread1 Threadn ours mine mine mine x 0 x 1 x n Shared and Private data Example vector addition shared int a 100 b 100 c 100 int i for i 0 i 100 i if MYTHREAD i THREADS a i b i c i 5 11 2 2010 Data distribution Efficient data distribution 6 11 2 2010 Blocking of shared data Default block size of a block is 1 Distributes data in round robin fashion Shared arrays can be distributed in blocks across threads shared 3 int A 4 THREADS Assume THREADS 4 Blocking of the shared array Thread 1 Thread 2 Thread 3 A 0 0 A 0 1 A 0 2 A 3 0 A 3 1 A 3 2 A 0 3 A 1 0 A 1 1 A 3 3 A 1 2 A 1 3 A 2 0 Thread 4 A 2 1 A 2 2 A 2 3 7 11 2 2010 Blocking of Shared Array Thread affinity ability of a thread to refer to an object by a private pointer Element i of a blocked array has affinity to thread For all Provides an opportunity to distribute iterations across the threads as you wish upc forall init test loop affinity Affinity expression decides which iteration to execute on which thread Affinity can be an integer or pointer 8 11 2 2010 For all Example 1 explicit affinity shared int a 100 b 100 c 100 int i upc forall i 0 i 100 i a i a i b i c i Example 2 implicit affinity shared int a 100 b 100 c 100 int i upc forall i 0 i 100 i i a i b i c i For all Example 3 blocked affinity shared 100 THREADS int a 100 b 100 c 100 int i upc forall i 0 i 100 i i THREADS 100 a i b i c i 9 11 2 2010 UPC Pointers Pointer Declarations int p1 local item which points locally shared int p2 local pointer to shared data int shared p3 shared pointer to local data shared int shared p4 shared pointer to shared data Shared to local memory p3 not recommended UPC Pointers 10 11 2 2010 UPC Pointers Pointer example Assume THREADS 3 shared int A 10 shared int dp A 2 dp1 dp1 dp 4 UPC Pointers Thread 1 A 0 dp 1 A 3 dp 4 A 6 A 9 Thread 2 A 1 dp 2 A 4 A 7 A 10 Thread 3 dp A 2 dp 3 A 5 A 8 dp1 11 11 2 2010 UPC Pointers Pointer example 2 Assume THREADS 3 shared 2 int A 10 shared 3 int dp A 2 dp1 dp1 dp 4 UPC Pointers Thread 2 Thread 1 A 0 dp A 1 dp 1 A 6 dp 2 A 7 A 2 dp 3 A 3 dp 4 A 8 A 9 Thread 3 A 4 A 5 A 10 dp1 12 11 2 2010 Dynamic memory allocation I shared void upc global alloc size t nblocks size t nbytes nblocks number of blocks nbytes block size Non collective operation Calling thread allocates memory in shared address space Dynamic memory allocation II shared void upc all alloc size t nblocks size t nbytes nblocks number of blocks nbytes block size Collective operation All threads will get the same pointer 13 11 2 2010 Dynamic memory allocation III shared void upc alloc size t nbytes nbytes block size Non collective operation Calling thread allocates memory in local shared address space Dynamic memory de allocation void upc free shared void ptr The upc free function frees the dynamically allocated shared memory pointed to by ptr Upc free is not collective 14 11 2 2010 Consistency Models I Used for the interaction of memory access in shared memory space Consistency can be strict or relaxed Relaxed consistency Program executed in a local consistency model Compiler analyses only shared memory access in the local thread Shared operations can be reordered by compiler Default environment is set by using upc relaxed h Consistency Models II Strict consistency Program executed in a sequential consistency model Compiler must take into account all memory accesses in all threads Reordering of operations is not allowed Default environment is set by using upc strict h 15 11 2 2010 Consistency Models III Default consistency model can be altered using pragma upc strict pragma upc relaxed Declare variables using type qualifiers Strict or relaxed Consistency Models Example include upc relaxed h Send val pragma upc strict next while flag data1 val1 statements can be reordered data2 val2 pragma upc strict next flag 1 int recv pragma upc strict next while flag tmp data1 data2 pragma upc strict next flag 0 16 11 2 2010 Consistency Models Example include upc strict h Send val while flag pragma upc relaxed next data1 val1 statements can be reordered Data2 val2 flag 1 int recv While flag pragma upc relaxed next tmp data1 data2 flag 0 Synchronization There is no implicit synchronization among threads Synchronization is provided using Barrier blocks until all threads arrive upc barrier Split phase barrier non blocking barrier upc notify upc wait 17 11 2 2010 Synchronization UPC uses locks to achieve synchronization for multiple writers upc lock t upc all alloc To acquire lock void upc lock upc lock t l To lock data void upc unlock …
View Full Document
Unlocking...