DOC PREVIEW
Berkeley COMPSCI C267 - Unified Parallel C

This preview shows page 1-2-3-4-5-32-33-34-35-65-66-67-68-69 out of 69 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 69 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 267 Unified Parallel C UPC Kathy Yelick http upc lbl gov Slides adapted from some by Tarek El Ghazawi GWU 2 23 09 CS267 Lecture UPC 1 UPC Outline 1 Background 2 UPC Execution Model 3 Basic Memory Model Shared vs Private Scalars 4 Synchronization 5 Collectives 6 Data and Pointers 7 Dynamic Memory Management 8 Programming Examples 9 Performance Tuning and Early Results 10 Concluding Remarks 2 23 09 CS267 Lecture UPC 2 Context Most parallel programs are written using either Message passing with a SPMD model Usually for scientific applications with C Fortran Scales easily Shared memory with threads in OpenMP Threads C C F or Java Usually for non scientific applications Easier to program but less scalable performance Global Address Space GAS Languages take the best of both global address space like threads programmability SPMD parallelism like most MPI programs performance local global distinction i e layout matters performance 2 23 09 CS267 Lecture UPC 3 History of UPC Initial Tech Report from IDA in collaboration with LLNL and UCB in May 1999 led by IDA UCB based on Split C based on course project motivated by Active Messages IDA based on AC think about GUPS or histogram just do it programs UPC consortium of government academia and HPC vendors coordinated by GMU IDA LBNL The participants past and present are ARSC Compaq CSC Cray Inc Etnus GMU HP IDA CCS Intrepid Technologies LBNL LLNL MTU NSA SGI Sun Microsystems UCB U Florida US DOD 2 23 09 CS267 Lecture UPC 4 PGAS Languages Global address space Global address space thread may directly read write remote data Virtualizes or hides the distinction between shared distributed memory Partitioned data is designated as local or global Does not hide this critical for locality and scaling x 1 y x 5 y l l l g g g p0 x 7 y 0 p1 pn UPC CAF Titanium Static parallelism 1 thread per proc Does not virtualize processors main difference from HPCS languages which have many dynamic threads 2 23 09 CS267 Lecture UPC 5 What Makes a Language Library PGAS Support for distributed data structures Distributed arrays local and global pointers references One sided shared memory communication Simple assignment statements x i y i or t p Bulk operations memory copy or array copy Optional remote invocation of functions Control over data layout PGAS is not the same as cache coherent shared memory Remote data stays remote in the performance model Synchronization Global barriers locks memory fences Collective Communication IO libraries etc 2 23 09 CS267 Lecture UPC 6 UPC Overview and Design Philosophy Unified Parallel C UPC is An explicit parallel extension of ANSI C A partitioned global address space language Sometimes called a GAS language Similar to the C language philosophy Programmers are clever and careful and may need to get close to hardware to get performance but can get in trouble Concise and efficient syntax Common and familiar syntax and semantics for parallel C with simple extensions to ANSI C Based on ideas in Split C AC and PCP 2 23 09 CS267 Lecture UPC 7 UPC Execution Model 2 23 09 CS267 Lecture UPC 8 UPC Execution Model A number of threads working independently in a SPMD fashion Number of threads specified at compile time or run time available as program variable THREADS MYTHREAD specifies thread index 0 THREADS 1 upc barrier is a global synchronization all wait There is a form of parallel loop that we will see later There are two compilation modes Static Threads mode THREADS is specified at compile time by the user The program may use THREADS as a compile time constant Dynamic threads mode Compiled code may be run with varying numbers of threads 2 23 09 CS267 Lecture UPC 9 Hello World in UPC Any legal C program is also a legal UPC program If you compile and run it as UPC with P threads it will run P copies of the program Using this fact plus the identifiers from the previous slides we can parallel hello world include upc h needed for UPC extensions include stdio h main printf Thread d of d hello UPC world n MYTHREAD THREADS 2 23 09 CS267 Lecture UPC 10 Example Monte Carlo Pi Calculation Estimate Pi by throwing darts at a unit square Calculate percentage that fall in the unit circle Area of square r2 1 Area of circle quadrant r2 4 Randomly throw darts at x y positions If x2 y2 1 then point is inside circle Compute ratio points inside points total 4 ratio r 1 2 23 09 CS267 Lecture UPC 11 Pi in UPC Independent estimates of pi main int argc char argv int i hits trials 0 double pi Each thread gets its own copy of these variables if argc 2 trials 1000000 else trials atoi argv 1 Each thread can use input arguments srand MYTHREAD 17 Initialize random in math library for i 0 i trials i hits hit pi 4 0 hits trials printf PI estimated to f pi 2 23 09 Each thread calls hit separately CS267 Lecture UPC 12 Helper Code for Pi in UPC Required includes include stdio h include math h include upc h Function to throw dart and calculate where it hits int hit int const rand max 0xFFFFFF double x double rand RAND MAX double y double rand RAND MAX if x x y y 1 0 return 1 else return 0 2 23 09 CS267 Lecture UPC 13 Shared vs Private Variables 2 23 09 CS267 Lecture UPC 14 Private vs Shared Variables in UPC Normal C variables and objects are allocated in the private memory space for each thread Shared variables are allocated only once with thread 0 shared int ours int mine use sparingly performance Shared variables may not have dynamic lifetime may not occur in a in a function definition except as static Why Global address space Thread0 Thread1 2 23 09 Threadn Shared ours mine mine mine Private CS267 Lecture UPC 15 Pi in UPC Shared Memory Style Parallel computing of pi but with a bug shared variable to shared int hits record hits main int argc char argv int i my trials 0 int trials atoi argv 1 divide work up evenly my trials trials THREADS 1 THREADS srand MYTHREAD 17 for i 0 i my trials i hits hit accumulate hits upc barrier if MYTHREAD 0 printf PI estimated to f 4 0 hits trials What is the problem with this program 2 23 09 CS267 Lecture UPC 16 Shared Arrays Are Cyclic By Default Shared scalars always live in thread 0 Shared arrays are spread over the threads Shared array elements are spread across the threads shared int x THREADS 1 element per thread shared int y 3 THREADS 3 elements per thread shared int z 3 3 2 or 3 elements per thread In the pictures below assume THREADS 4 Red elts have affinity to thread 0 Think of linearized C array then map in round robin x As a 2D


View Full Document

Berkeley COMPSCI C267 - Unified Parallel C

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Unified Parallel C
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Unified Parallel C and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Unified Parallel C 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?