New version page

A Portable High-Performance Communication Layer for Global Address-Space Languages

This preview shows page 1-2-3-24-25-26 out of 26 pages.

View Full Document
View Full Document

End of preview. Want to read all 26 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

GASNet: A Portable High-Performance Communication Layer for Global Address-Space LanguagesIntroductionThe Case for PortabilityNERSC/UPC Runtime System OrganizationGASNet Communication System- GoalsGASNet Communication System- ArchitectureProgress to DateExtended API – Remote memory operationsExtended API – Remote memory operationsExtended API – Remote memory operationsCore API – Active MessagesCore API – Atomicity Support for Active MessagesWhy interrupt-based handlers cause problemsHandler-Safe LocksNo-Interrupt SectionsJaein's partExperimentsLatency (IBM SP, network depth = 8)Bandwidth (IBM SP, network depth = 8)Inv. Throughput (IBM SP, network depth = 8)Inv. Throughput (IBM SP, network depth = 8)ResultsConclusionsFuture WorkExtra SlidesPortable UPC ImplementationGASNet: A Portable High-Performance Communication Layer for Global Address-Space LanguagesDan BonacheaJaein JeongIn conjunction with the joint UCB and NERSC/LBL UPC compiler development projecthttp://upc.nersc.govIntroduction• Two major paradigms for parallel programming– Shared Memory• single logical memory space, loads and stores for communication• ease of programming– Message Passing• disjoint memory spaces, explicit communication • often more scalable and higher-performance• Another Possibility: Global-Address Space (GAS) Languages– Provide a global shared memory abstraction to the user, regardless of the hardware implementation– Make distinction between local & remote memory explicit– Get the ease of shared memory programming, and the performance of message passing– Examples: UPC, Titanium, Co-array Fortran, …The Case for Portability• Most current UPC compiler implementations generate code directly for the target system – Requires compilers to be rewritten from scratch for each platform and network• We want a more portable, but still high-performance solution– Want to re-use our investment in compiler technology across different platforms, networks and machine generations– Want to compare the effects of experimental parallel compiler optimizations across platforms– The existence of a fully portable compiler helps the acceptability of UPC as a whole for application writersNERSC/UPC Runtime System OrganizationCompiler-generated C codeUPC Runtime systemGASNet Communication SystemNetwork HardwarePlatform- independentNetwork- independentLanguage- independentCompiler- independentUPC CodeCompilerGASNet Communication System- Goals• Language-independence: Compatibility with several global-address space languages and compilers– UPC, Titanium, Co-array Fortran, possibly others..– Hide UPC- or compiler-specific details such as shared-pointer representation• Hardware-independence: variety of parallel architectures & OS's– SMP: Origin 2000, Linux/Solaris multiprocessors, etc.– Clusters of uniprocessors: Linux clusters (myrinet, infiniband, via, etc)– Clusters of SMPs: IBM SP-2 (LAPI), Linux CLUMPS, etc.• Ease of implementation on new hardware– Allow quick implementations– Allow implementations to leverage performance characteristics of hardware• Want both portability & performanceGASNet Communication System- Architecture• 2-Level architecture to ease implementation:• Core API– Most basic required primitives, as narrow and general as possible– Implemented directly on each platform– Based heavily on active messages paradigm• Extended API– Wider interface that includes more complicated operations– We provide a reference implementation of the extended API in terms of the core API– Implementors can choose to directly implement any subset for performance - leverage hardware support for higher-level operationsCompiler-generated codeCompiler-specific runtime systemGASNet Extended APIGASNet Core APINetwork HardwareProgress to Date• Wrote the GASNet Specification– Included inventing a mechanism for safely providing atomicity in Active Message handlers• Reference implementation of extended API – Written solely in terms of the core API• Implemented a prototype core API for one platform (a portable MPI-based core)• Evaluate the performance using micro benchmarks to measure bandwidth and latency– Focus on the additional overhead of using GASNetExtended API – Remote memory operations• Orthogonal, expressive, high-performance interface– Gets & Puts for Scalars and Bulk contiguous data – Blocking and non-blocking (returns a handle)– Also have a non-blocking form where the handle is implicit• Non-blocking synchronization– Sync on a particular operation (using a handle)– Sync on a list of handles (some or all)– Sync on all pending reads, writes or both (for implicit handles)– Sync on operations initiated in a given interval– Allow polling (trysync) or blocking (waitsync)• Useful for experimenting with a variety of parallel compiler optimization techniquesExtended API – Remote memory operations• API for remote gets/puts:void get (void *dest, int node, void *src, int numbytes)handle get_nb (void *dest, int node, void *src, int numbytes)void get_nbi(void *dest, int node, void *src, int numbytes)void put (int node, void *src, void *src, int numbytes)handle put_nb (int node, void *src, void *src, int numbytes)void put_nbi(int node, void *src, void *src, int numbytes)• "nb" = non-blocking with explicit handle• "nbi" = non-blocking with implicit handle• Also have "value" forms that are register-memory• Recognize and optimize common sizes with macros• Extensibility of core API allows easily adding other more complicated access patterns (scatter/gather, strided, etc)• Names will all be prefixed by "gasnet_" to prevent naming conflictsExtended API – Remote memory operations• API for get/put synchronization:• Non-blocking ops with explicit handles:int try_syncnb(handle)void wait_syncnb(handle)int try_syncnb_some(handle *, int numhandles)void wait_syncnb_some(handle *, int numhandles)int try_syncnb_all(handle *, int numhandles)void wait_syncnb_all(handle *, int numhandles)• Non-blocking ops with implicit handles:int try_syncnbi_gets()void wait_syncnbi_gets()int try_syncnbi_puts()void wait_syncnbi_puts()int try_syncnbi_all() // gets & putsvoid wait_syncnbi_all()Core API – Active Messages• Super-Lightweight RPC– Unordered, reliable delivery– Matched request/reply serviced by "user"-provided lightweight handlers–


Loading Unlocking...
Login

Join to view A Portable High-Performance Communication Layer for Global Address-Space Languages and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Portable High-Performance Communication Layer for Global Address-Space Languages and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?