DOC PREVIEW
UCLA COMSCI 239 - The High-Level Parallel Language ZPL

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The High-Level Parallel Language ZPL Improves Productivity and PerformanceBradford L. Chamberlain?†Sung-Eun Choi‡Steven J. Deitz?Lawrence Snyder??University of WashingtonSeattle, WA 98195{brad,deitz,snyder}@cs.washington.edu†Cray Inc.Seattle, WA [email protected]‡Los Alamos National LaboratoryLos Alamos, NM [email protected] this paper, we qualitatively address how high-levelparallel languages improve productivity and performance.Using ZPL as a case study, we discuss advantages that stemfrom a language having both a global (rather than a per-processor) view of the computation and an underlying per-formance model that statically identifies communication incode. We also candidly discuss several disadvantages toZPL.1. IntroductionIn the spring of 2003, we encountered a curious bug in oneof the NAS parallel benchmarks. To evaluate the scalability ofZPL, we were comparing our ZPL implementation of the NAS CGbenchmark against the provided Fortran+MPI implementation onan increasing power-of-two number of processors of a new 1024-node cluster at Los Alamos National Laboratory (LANL). Both im-plementations ran flawlessly on up to 512 processors but, on ourfirst 1024 processor run, the Fortran+MPI failed to verify cor-rectly even as the ZPL worked. A day after we reported the failedverification to NAS, they were able to produce identical erroneousresults on an IBM SP.1It wasn’t a strange interaction betweenLANL’s experimental cluster and ZPL, but rather a bug in thelong-standing Fortran+MPI benchmark...* * *ZPL is a high-level parallel programming language de-veloped at the University of Washington. Our implementa-tion is based on a compiler that translates ZPL programsto C code with calls to MPI, PVM, or SHMEM, as theuser chooses. Since the first release of this compiler in1997, there have been significant improvements as we haveevolved the language. This paper discusses some of thelessons we have learned over this time.1Personal Communication. Rob F. Van der Wijngaart. April 9, 2003.Like Co-array Fortran, High Performance Fortran, Ti-tanium, Unified Parallel C and other parallel languages,ZPL offers scientists who are frustrated by MPI a muchimproved parallel programming experience. The anecdoteabove, which we will come back to later in this paper, il-lustrates this point and is the sort of issue we will discussin this paper. The point of this anecdote is not that the pro-vided Fortran+MPI benchmark was poorly written. Indeed,the NAS benchmarksare well-known for being well-writtenand highly-optimized. The point, as we will see later, isthat the high-level nature of ZPL virtually eliminates a wideclass of parallel programming bugs, thus making parallelprogramming easier.Focusing on ZPL, this paper addresses how high-levelparallel languages improve both productivity and perfor-mance. Throughout this paper, we will present anecdotes,code segments, and qualitative arguments as evidence ofthis improvement. The purpose of this paper is not to ad-vertise ZPL but rather to encourage researchers to explorethe space of language abstractions which ZPL champions.This paper is organized as follows. In the next section,we characterize the design space of ZPL. No introduction tothe language is offered; the interested reader is instead re-ferred to the literature [4, 21]. In Section 3, we examine as-pects of ZPL that increase productivity and performance. InSection 4, we discuss limitations of ZPL and, in Section 5,we conclude.2. Characterizing ZPLFigure 1 shows C+MPI and ZPL implementations of atrivial benchmark. The idea behind the benchmark is to it-eratively replace each element in a 1D array with the aver-age of its two neighboring elements until the change be-tween the values in the array on successive iterations issmall. Though admittedly contrived, the codes effectivelyillustrate two important characteristics of ZPL.First, ZPL is a global-view parallel language. The pro-grammer writes code that largely disregards the processorsthat will execute it. Thus array A is declared based on the# include < s t d io . h># include < s t d l i b . h>| # include ” mpi . h”int n ;double ∗A, ∗ Tmp;const double e p s i l o n = 0 . 0 0 0 0 0 1 ;int main ( int argc , char ∗ argv [ ] ) {int i , i t e r s ;double d e l t a ;| i n t numprocs , rank , my siz e ;| double sum ;| MPIIn it (&argc , & argv ) ;| MPIComm size (MPI COMM WORLD, & numprocs ) ;| MPIComm rank (MPI COMM WORLD, & rank ) ;i f ( arg c ! = 2 ) {p r i n t f ( ” usage :l i ne n\n” ) ;e x i t ( 1 ) ;}n = a t o i ( argv [ 1 ] ) ;| mys ize = n ∗ ( rank + 1 ) / numprocs −| n ∗ rank / numprocs ;A = malloc ( ( m ysi ze +2)∗ s i ze o f ( double ) ) ;for ( i = 0 ; i < = mysize ; i ++)A[ i ] = 0 . 0 ;| i f ( rank == numprocs − 1)A[ mysize + 1 ] = n + 1 . 0 ;Tmp = malloc ( ( m ysi ze +2)∗ s iz e of ( double ) ) ;i t e r s = 0 ;do {i t e r s ++;| i f ( rank < numprocs−1)| MPISend(&(A[ mysize ] ) , 1 , MPI DOUBLE , rank + 1 ,| 1 , MPICOMM WORLD);| i f ( rank > 0)| MPIRecv(&(A [ 0 ] ) , 1 , MPI DOUBLE , rank − 1,| 1 , MPICOMM WORLD, MPI STATUS IGNORE );| i f ( rank > 0)| MPISend(&(A [ 1 ] ) , 1 , MPI DOUBLE , rank − 1,| 1 , MPI COMM WORLD) ;| i f ( rank < numprocs−1)| MPIRecv(&(A[ mysize + 1] ) , 1 , MPI DOUBLE, rank + 1 ,| 1 , MPI COMM WORLD , MPI STATUS IGNORE );for ( i = 1 ; i < = mysize ; i ++)Tmp[ i ] = (A[ i − 1] + A[ i + 1 ] ) / 2 . 0 ;d e l ta = 0 .0 ;for ( i = 1 ; i < = mysize ; i ++)d e l ta += fabs (A[ i ] − Tmp[ i ] ) ;| MPIAllreduce (& de l t a , & sum , 1 , MPI DOUBLE,| MPI SUM , MPI COMM WORLD) ;| de l t a = sum ;for ( i = 1 ; i < = mysize ; i ++)A[ i ] = Tmp[ i ] ;} while ( d elt a > e p s i l o n ) ;| i f ( rank == 0)p r i n t f ( ” I t e r a t i o n s :%d\n” , i t e r s ) ;| MPI Fina liz e ( ) ;}program l i n e ;config varn : i n t e g e r = 6 ;regionR = [ 1 . . n ] ;BigR = [ 0 . . n +1 ];directione a s t = [ 1 ] ;west = [ − 1 ] ;varA , Tmp : [ BigR ] double ;constante p s i l o n : double = 0 . 0 0 0 0 0 1 ;procedure l i n e ( ) ;vari t e r s : integer ;d e l t a : double ;begin[ BigR ] A : = 0 ;[ n …


View Full Document

UCLA COMSCI 239 - The High-Level Parallel Language ZPL

Download The High-Level Parallel Language ZPL
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The High-Level Parallel Language ZPL and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The High-Level Parallel Language ZPL 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?