UCLA COMSCI 239 - The High-Level Parallel Language ZPL - D2606347

Home> Schools> University of California, Los Angeles> Computer Science (COMSCI) > COMSCI 239> The High-Level Parallel Language ZPL

DOC PREVIEW

UCLA COMSCI 239 - The High-Level Parallel Language ZPL

School name University of California, Los Angeles

Course Comsci 239- Current Topics in Computer Science: Programming Languages and Systems

Pages 10

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

The High-Level Parallel Language ZPL Improves Productivity and PerformanceBradford L. Chamberlain?†Sung-Eun Choi‡Steven J. Deitz?Lawrence Snyder??University of WashingtonSeattle, WA 98195{brad,deitz,snyder}@cs.washington.edu†Cray Inc.Seattle, WA [email protected]‡Los Alamos National LaboratoryLos Alamos, NM [email protected] this paper, we qualitatively address how high-levelparallel languages improve productivity and performance.Using ZPL as a case study, we discuss advantages that stemfrom a language having both a global (rather than a per-processor) view of the computation and an underlying per-formance model that statically identifies communication incode. We also candidly discuss several disadvantages toZPL.1. IntroductionIn the spring of 2003, we encountered a curious bug in oneof the NAS parallel benchmarks. To evaluate the scalability ofZPL, we were comparing our ZPL implementation of the NAS CGbenchmark against the provided Fortran+MPI implementation onan increasing power-of-two number of processors of a new 1024-node cluster at Los Alamos National Laboratory (LANL). Both im-plementations ran flawlessly on up to 512 processors but, on ourfirst 1024 processor run, the Fortran+MPI failed to verify cor-rectly even as the ZPL worked. A day after we reported the failedverification to NAS, they were able to produce identical erroneousresults on an IBM SP.1It wasn’t a strange interaction betweenLANL’s experimental cluster and ZPL, but rather a bug in thelong-standing Fortran+MPI benchmark...* * *ZPL is a high-level parallel programming language de-veloped at the University of Washington. Our implementa-tion is based on a compiler that translates ZPL programsto C code with calls to MPI, PVM, or SHMEM, as theuser chooses. Since the first release of this compiler in1997, there have been significant improvements as we haveevolved the language. This paper discusses some of thelessons we have learned over this time.1Personal Communication. Rob F. Van der Wijngaart. April 9, 2003.Like Co-array Fortran, High Performance Fortran, Ti-tanium, Unified Parallel C and other parallel languages,ZPL offers scientists who are frustrated by MPI a muchimproved parallel programming experience. The anecdoteabove, which we will come back to later in this paper, il-lustrates this point and is the sort of issue we will discussin this paper. The point of this anecdote is not that the pro-vided Fortran+MPI benchmark was poorly written. Indeed,the NAS benchmarksare well-known for being well-writtenand highly-optimized. The point, as we will see later, isthat the high-level nature of ZPL virtually eliminates a wideclass of parallel programming bugs, thus making parallelprogramming easier.Focusing on ZPL, this paper addresses how high-levelparallel languages improve both productivity and perfor-mance. Throughout this paper, we will present anecdotes,code segments, and qualitative arguments as evidence ofthis improvement. The purpose of this paper is not to ad-vertise ZPL but rather to encourage researchers to explorethe space of language abstractions which ZPL champions.This paper is organized as follows. In the next section,we characterize the design space of ZPL. No introduction tothe language is offered; the interested reader is instead re-ferred to the literature [4, 21]. In Section 3, we examine as-pects of ZPL that increase productivity and performance. InSection 4, we discuss limitations of ZPL and, in Section 5,we conclude.2. Characterizing ZPLFigure 1 shows C+MPI and ZPL implementations of atrivial benchmark. The idea behind the benchmark is to it-eratively replace each element in a 1D array with the aver-age of its two neighboring elements until the change be-tween the values in the array on successive iterations issmall. Though admittedly contrived, the codes effectivelyillustrate two important characteristics of ZPL.First, ZPL is a global-view parallel language. The pro-grammer writes code that largely disregards the processorsthat will execute it. Thus array A is declared based on the# include < s t d io . h># include < s t d l i b . h>| # include ” mpi . h”int n ;double ∗A, ∗ Tmp;const double e p s i l o n = 0 . 0 0 0 0 0 1 ;int main ( int argc , char ∗ argv [ ] ) {int i , i t e r s ;double d e l t a ;| i n t numprocs , rank , my siz e ;| double sum ;| MPIIn it (&argc , & argv ) ;| MPIComm size (MPI COMM WORLD, & numprocs ) ;| MPIComm rank (MPI COMM WORLD, & rank ) ;i f ( arg c ! = 2 ) {p r i n t f ( ” usage :l i ne n\n” ) ;e x i t ( 1 ) ;}n = a t o i ( argv [ 1 ] ) ;| mys ize = n ∗ ( rank + 1 ) / numprocs −| n ∗ rank / numprocs ;A = malloc ( ( m ysi ze +2)∗ s i ze o f ( double ) ) ;for ( i = 0 ; i < = mysize ; i ++)A[ i ] = 0 . 0 ;| i f ( rank == numprocs − 1)A[ mysize + 1 ] = n + 1 . 0 ;Tmp = malloc ( ( m ysi ze +2)∗ s iz e of ( double ) ) ;i t e r s = 0 ;do {i t e r s ++;| i f ( rank < numprocs−1)| MPISend(&(A[ mysize ] ) , 1 , MPI DOUBLE , rank + 1 ,| 1 , MPICOMM WORLD);| i f ( rank > 0)| MPIRecv(&(A [ 0 ] ) , 1 , MPI DOUBLE , rank − 1,| 1 , MPICOMM WORLD, MPI STATUS IGNORE );| i f ( rank > 0)| MPISend(&(A [ 1 ] ) , 1 , MPI DOUBLE , rank − 1,| 1 , MPI COMM WORLD) ;| i f ( rank < numprocs−1)| MPIRecv(&(A[ mysize + 1] ) , 1 , MPI DOUBLE, rank + 1 ,| 1 , MPI COMM WORLD , MPI STATUS IGNORE );for ( i = 1 ; i < = mysize ; i ++)Tmp[ i ] = (A[ i − 1] + A[ i + 1 ] ) / 2 . 0 ;d e l ta = 0 .0 ;for ( i = 1 ; i < = mysize ; i ++)d e l ta += fabs (A[ i ] − Tmp[ i ] ) ;| MPIAllreduce (& de l t a , & sum , 1 , MPI DOUBLE,| MPI SUM , MPI COMM WORLD) ;| de l t a = sum ;for ( i = 1 ; i < = mysize ; i ++)A[ i ] = Tmp[ i ] ;} while ( d elt a > e p s i l o n ) ;| i f ( rank == 0)p r i n t f ( ” I t e r a t i o n s :%d\n” , i t e r s ) ;| MPI Fina liz e ( ) ;}program l i n e ;config varn : i n t e g e r = 6 ;regionR = [ 1 . . n ] ;BigR = [ 0 . . n +1 ];directione a s t = [ 1 ] ;west = [ − 1 ] ;varA , Tmp : [ BigR ] double ;constante p s i l o n : double = 0 . 0 0 0 0 0 1 ;procedure l i n e ( ) ;vari t e r s : integer ;d e l t a : double ;begin[ BigR ] A : = 0 ;[ n …

View Full Document