DOC PREVIEW
Berkeley COMPSCI C267 - Unified Parallel C

This preview shows page 1-2-3-4-5-32-33-34-35-64-65-66-67-68 out of 68 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 68 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

3/2/2007 CS267 Lecture: UPC 1CS 267Unified Parallel C (UPC)Kathy Yelickhttp://upc.lbl.govSlides adapted from some by Tarek El-Ghazawi (GWU)3/2/2007 CS267 Lecture: UPC 2UPC Outline1. Background2. UPC Execution Model3. Basic Memory Model: Shared vs. Private Scalars4. Synchronization5. Collectives6. Data and Pointers7. Dynamic Memory Management8. Programming Examples8. Performance Tuning and Early Results9. Concluding Remarks3/2/2007 CS267 Lecture: UPC 3Context• Most parallel programs are written using either:• Message passing with a SPMD model• Usually for scientific applications with C++/Fortran• Scales easily• Shared memory with threads in OpenMP, Threads+C/C++/F or Java• Usually for non-scientific applications• Easier to program, but less scalable performance• Global Address Space (GAS) Languages take the best of both• global address space like threads (programmability)• SPMD parallelism like MPI (performance)• local/global distinction, i.e., layout matters (performance)3/2/2007 CS267 Lecture: UPC 4Partitioned Global Address Space Languages• Explicitly-parallel programming model with SPMD parallelism• Fixed at program start-up, typically 1 thread per processor• Global address space model of memory• Allows programmer to directly represent distributed data structures• Address space is logically partitioned• Local vs. remote memory (two-level hierarchy)• Programmer control over performance critical decisions• Data layout and communication • Performance transparency and tunability are goals• Initial implementation can use fine-grained shared memory• Multiple PGAS languages: UPC (C), CAF (Fortran), Titanium (Java)3/2/2007 CS267 Lecture: UPC 5Global Address Space Eases Programming• The languages share the global address space abstraction• Shared memory is logically partitioned by processors• Remote memory may stay remote: no automatic caching implied• One-sided communication: reads/writes of shared variables• Both individual and bulk memory copies • Languages differ on details• Some models have a separate private memory area• Distributed array generality and how they are constructedSharedGlobal address spaceX[0]Privateptr: ptr: ptr: X[1] X[P]Thread0Thread1Threadn3/2/2007 CS267 Lecture: UPC 6Current Implementations of PGAS Languages• A successful language/library must run everywhere•UPC• Commercial compilers available on Cray, SGI, HP machines• Open source compiler from LBNL/UCB (source-to-source)• Open source gcc-based compiler from Intrepid•CAF• Commercial compiler available on Cray machines• Open source compiler available from Rice•Titanium • Open source compiler from UCB runs on most machines• Common tools• Open64 open source research compiler infrastructure• ARMCI, GASNet for distributed memory implementations• Pthreads, System V shared memory3/2/2007 CS267 Lecture: UPC 7UPC Overview and Design Philosophy• Unified Parallel C (UPC) is:• An explicit parallel extension of ANSI C • A partitioned global address space language• Sometimes called a GAS language• Similar to the C language philosophy• Programmers are clever and careful, and may need to get close to hardware• to get performance, but• can get in trouble• Concise and efficient syntax• Common and familiar syntax and semantics for parallel C with simple extensions to ANSI C• Based on ideas in Split-C, AC, and PCP3/2/2007 CS267 Lecture: UPC 8UPC Execution Model3/2/2007 CS267 Lecture: UPC 9UPC Execution Model• A number of threads working independently in a SPMD fashion• Number of threads specified at compile-time or run-time; available as program variable THREADS• MYTHREAD specifies thread index (0..THREADS-1)• upc_barrier is a global synchronization: all wait• There is a form of parallel loop that we will see later• There are two compilation modes• Static Threads mode:• THREADS is specified at compile time by the user• The program may use THREADS as a compile-time constant• Dynamic threads mode:• Compiled code may be run with varying numbers of threads3/2/2007 CS267 Lecture: UPC 10Hello World in UPC• Any legal C program is also a legal UPC program• If you compile and run it as UPC with P threads, it will run P copies of the program.• Using this fact, plus the identifiers from the previous slides, we can parallel hello world:#include <upc.h> /* needed for UPC extensions */#include <stdio.h>main() {printf("Thread %d of %d: hello UPC world\n", MYTHREAD, THREADS);}3/2/2007 CS267 Lecture: UPC 11Example: Monte Carlo Pi Calculation• Estimate Pi by throwing darts at a unit square• Calculate percentage that fall in the unit circle• Area of square = r2= 1• Area of circle quadrant = ¼ * π r2 = π/4• Randomly throw darts at x,y positions•If x2+ y2< 1, then point is inside circle• Compute ratio:• # points inside / # points total• π = 4*ratio r =13/2/2007 CS267 Lecture: UPC 12Each thread calls “hit” separatelyInitialize random in math libraryEach thread can use input argumentsEach thread gets its own copy of these variablesPi in UPC • Independent estimates of pi:main(int argc, char **argv) {int i, hits, trials = 0;double pi;if (argc != 2)trials = 1000000;else trials = atoi(argv[1]);srand(MYTHREAD*17);for (i=0; i < trials; i++) hits += hit();pi = 4.0*hits/trials;printf("PI estimated to %f.", pi);}3/2/2007 CS267 Lecture: UPC 13Helper Code for Pi in UPC• Required includes:#include <stdio.h>#include <math.h> #include <upc.h> • Function to throw dart and calculate where it hits:int hit(){int const rand_max = 0xFFFFFF;double x = ((double) rand()) / RAND_MAX;double y = ((double) rand()) / RAND_MAX;if ((x*x + y*y) <= 1.0) {return(1);} else {return(0);}}3/2/2007 CS267 Lecture: UPC 14Shared vs. Private Variables3/2/2007 CS267 Lecture: UPC 15Private vs. Shared Variables in UPC• Normal C variables and objects are allocated in the private memory space for each thread.• Shared variables are allocated only once, with thread 0shared int ours; // use sparingly: performanceint mine;• Shared variables may not have dynamic lifetime: may not occur in a in a function definition, except as static. Why?SharedGlobal address spacePrivatemine: mine: mine: Thread0Thread1Threadnours:3/2/2007 CS267 Lecture: UPC 16Pi in UPC: Shared Memory Style• Parallel computing of pi, but with a bugshared int hits;main(int argc, char **argv) {int i, my_trials = 0;int trials = atoi(argv[1]);my_trials =


View Full Document

Berkeley COMPSCI C267 - Unified Parallel C

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Unified Parallel C
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Unified Parallel C and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Unified Parallel C 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?