This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Phil RussellAlex LiberMicah WendellWhat It Is Formally called F--, Small set of semantic extensions to Fortran 95 Simple syntactic extension to Fortran 95 Single Program Multiple Data, SPMD, parallelprocessingWhat It Is Robust, efficient parallel language. Requires learning only a few new rules. Rules handle two fundamental issues:Work distributionData distribution.Work Distribution A single program is replicated a fixed number oftimes Each replication has its own set of data objects. Each replication of the program is called an image. Each image executes asynchronouslyWork Distribution The normal rules of Fortran apply The execution path may differ from image to image. The programmer determines the actual path for theimage withA unique image indexNormal Fortran control constructsExplicit synchronizations.Work Distribution Code between synchronizationsThe compiler is free to use all its normaloptimization techniques, as if only one image ispresentData Distribution Specify the data relationships One new object, the co-array, is added to thelanguage An example…Data DistributionREAL, DIMENSION(N)[*] :: X,Y X(:) = Y(:)[Q]The above statement declares that each image hastwo real arrays of size N. If Q has the same value oneach image, the effect of the assignment statementis that each image copies the array Y from image Qand makes a local copy in array X.Data Distribution (index) follow the normal Fortran rules within onememory image. [index] provide access to objects across images andfollow similar rules. [bounds] in co-array declarations follow the rules ofassumed-size arrays since co-arrays are alwaysspread over all the images.Data Distribution The programmer uses co-array syntax only where itis needed A co-array reference with no square brackets is areference to the object in the local memory Co-array syntax should appear only in isolated partsof the code If not, too much communication among images?Flags compiler to avoid latencyFlags programmer to rethinkExtended Fortran 90 ArraySyntax A way of expressing remote memory operations. Hereare some simple examples: X = Y[PE] get from Y[PE] Y[PE] = X put into Y[PE] Y[:] = X broadcast X Y[LIST] = X broadcast X over subset of PE'sin array LIST Z(:) = Y[:] collect all Y S = MINVAL(Y[:]) min (reduce) all Y B(1:M)[1:N] = S S scalar, promoted toarray of shape (1:M,1:N)Input/Output Input/output problem with SPMD programmingmodels Fortran I/O assumes dedicated single-processaccess to an open fileOften violated when it is assumed that I/O fromeach image is completely independent.Input/Output Co-Array Fortran includes only minor extensions toFortran 95 I/O, All the inconsistencies of earlier programmingmodels have been avoided There is explicit support for parallel I/O. I/O is compatible with both process-based andthread-based implementations.Other Fortran 95 additions:Several Intrinsics NUM_IMAGES() returns the number of images, THIS_IMAGE() returns this image's index between 1and NUM_IMAGES() SYNC_ALL() is a global barrier To only wait for the relevant images to arrive.SYNC_ALL(WAIT=LIST)More Intrinsics SYNC_TEAM(TEAM=TEAM) SYNC_TEAM(TEAM=TEAM,WAIT=LIST) START_CRITICAL and END_CRITICALAdding Synch Functionality SYNC_MEMORY().This routine forces the local image to both completeany outstanding co-array writes into ``global'' memoryand refresh from global memory any local copies of co-array data it might be holding (in registers for example).Image synchronization implies co-arraysynchronization. A call to SYNC_MEMORY() is rarely requiredImplicitly called before and after virtually allprocedure calls including Co-Array's built in imagesynchronization intrinsics.Image and co-array synchronization Example: exchanging an array with yournorth and south neighbors: COMMON/XCTILB4/ B(N,4)[*] SAVE /XCTILB4/ CALL SYNC_ALL(WAIT=(/IMG_S,IMG_N/) ) B(:,3) = B(:,1)[IMG_S] B(:,4) = B(:,2)[IMG_N] CALL SYNC_ALL(WAIT=(/IMG_S,IMG_N/) )Array ExchangeSynchronization Explained The first SYNC_ALL waits until the remote B(:,1:2) isready to be copied The second waits until it is safe to overwrite the localB(:,1:2). Only nearest neighbors are involved in the sync. It is always safe to replace SYNC_ALL(WAIT=LIST)calls with global SYNC_ALL() callsOften is significantly slower.Either the preceding or succeeding synchronization may be avoidable.Synch Optimization The majority of remote co-array access optimizationis minimizing the synchronizationFrequency of synchronizationCover the minimum number of images On machines without global memory hardware, arraysyntax (rather than DO loops) should always beused for remote memory operations Copying co-array's into local temporary buffersbefore they are required might be appropriateData Parallel Cumulative Sum In data parallel programs, each image is either performingthe same operation or is idle. For example here is a data parallel fixed order cumulativesum: REAL SUM[*] CALL SYNC_ALL( WAIT=1 ) DO IMG= 2,NUM_IMAGES() IF (IMG==THIS_IMAGE()) THEN SUM = SUM + SUM[IMG-1] ENDIF CALL SYNC_ALL( WAIT=IMG ) ENDDOData Parallel PerformanceCritique SYNC_ALL waiting on just the active imageimproves performance still NUM_IMAGES() global syncAn Alternative to Data Parallel A better alternative may be to minimize synchronizationby avoiding the data parallel overhead entirely: REAL SUM[*] ME = THIS_IMAGE() IF (ME.GT.1) THEN CALL SYNC_TEAM( TEAM=(/ME-1,ME/) ) SUM = SUM + SUM[ME-1] ENDIF IF (ME.LT.NUM_IMAGES()) THEN CALL SYNC_TEAM( TEAM=(/ME,ME+1/) ) ENDIFAlternative PerformanceAnalysis Now each image is involved in at most two sync's:the images just before and just after it in image order. The first SYNC_TEAM call on one image is matchedby the second SYNC_TEAM call on the previousimage.Benefits (or: In Summary) The Co-Array Fortran synchronization intrinsics can :Improve the performance of data parallelalgorithmsProvide implicit program execution control asan alternative to the data parallel


View Full Document

UCLA COMSCI 239 - Co-Array Fortran

Download Co-Array Fortran
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Co-Array Fortran and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Co-Array Fortran 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?