UMD CMSC 714 - An Industry- Standard API for Shared- Memory Programming - D2941608

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 714> An Industry- Standard API for Shared- Memory Programming

DOC PREVIEW

UMD CMSC 714 - An Industry- Standard API for Shared- Memory Programming

School name University of Maryland, College Park

Course Cmsc 714- High Performance Computing Systems

Pages 10

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

FEATURE ARTICLE46 1070-9924/98/$10.00 © 1998 IEEE IEEE COMPUTATIONAL SCIENCE & ENGINEERINGApplication developers have long recognizedthat scalable hardware and software are nec-essary for parallel scalability in applicationperformance. Both have existed for some timein their lowest common denominator form, and scalablehardware—as physically distributed memories connectedthrough a scalable interconnection network (as a multi-stage interconnect, k-ary n-cube, or fat tree)—has beencommercially available since the 1980s. When develop-ers build such systems without any provision for cachecoherence, the systems are essentially “zeroth order”scalable architectures. They provide only a scalable in-terconnection network, and the burden of scalability fallson the software. As a result, scalable software for suchsystems exists, at some level, only in a message-passingmodel. Message passing is the native model for these ar-chitectures, and developers can only build higher-levelmodels on top of it.Unfortunately, many in the high-performance com-puting world implicitly assume that the only way toachieve scalability in parallel software is with a message-passing programming model. This is not necessarily true.A class of multiprocessor architectures is now emergingthat offers scalable hardware support for cache coher-ence. These are generally called scalable shared memorymultiprocessor architectures.1For SSMP systems, the na-tive programming model is shared memory, and messagepassing is built on top of the shared-memory model. Onsuch systems, software scalability is straightforward toachieve with a shared-memory programming model.In a shared-memory system, every processor has di-rect access to the memory of every other processor,meaning it can directly load or store any shared address.The programmer also can declare certain pieces of mem-ory as private to the processor, which provides a simpleyet powerful model for expressing and managing paral-lelism in an application.Despite its simplicity and scalability, many parallel ap-plications developers have resisted adopting a shared-memory programming model for one reason: portabil-ity. Shared-memory system vendors have created theirown proprietary extensions to Fortran or C for paral-lel-software development. However, the absence ofportability has forced many developers to adopt aportable message-passing model such as the MessagePassing Interface (MPI) or Parallel Virtual Machine(PVM). This article presents a portable alternative tomessage passing: OpenMP. OpenMP: An Industry-Standard API for Shared-Memory ProgrammingLEONARDO DAGUM AND RAMESH MENONSILICON GRAPHICS INC.♦♦♦OpenMP, the portable alternative to message passing, offers a powerful new way to achieve scalability in software. This article compares OpenMP to existing parallel-programming models.♦.JANUARY–MARCH 1998 47OpenMP was designed to exploit certain char-acteristics of shared-memory architectures. Theability to directly access memory throughout thesystem (with minimum latency and no explicitaddress mapping), combined with fast shared-memory locks, makes shared-memory architec-tures best suited for supporting OpenMP.Why a new standard?The closest approximation to a standard shared-memory programming model is the now-dormant ANSI X3H5 standards effort.2X3H5was never formally adopted as a standard largelybecause interest waned as distributed-memorymessage-passing systems (MPPs) came intovogue. However, even though hardware vendorssupport it to varying degrees, X3H5 has limita-tions that make it unsuitable for anything otherthan loop-level parallelism. Consequently, ap-plications adopting this model are often limitedin their parallel scalability.MPI has effectively standardized the message-passing programming model. It is a portable,widely available, and accepted standard for writ-ing message-passing programs. Unfortunately,message passing is generally a difficult way toprogram. It requires that the program’s datastructures be explicitly partitioned, so the entireapplication must be parallelized to work withthe partitioned data structures. There is no in-cremental path to parallelize an application.Furthermore, modern multiprocessor architec-tures increasingly provide hardware support forcache coherence; therefore, message passing isbecoming unnecessary and overly restrictive forthese systems.Pthreads is an accepted standard for sharedmemory in low-end systems. However, it is nottargeted at the technical, HPC space. There islittle Fortran support for pthreads, and it is nota scalable approach. Even for C applications, thepthreads model is awkward, because it is lower-level than necessary for most scientiﬁc applica-tions and is targeted more at providing task par-allelism, not data parallelism. Also, portabilityto unsupported platforms requires a stub libraryor equivalent workaround.Researchers have deﬁned many new languagesfor parallel computing, but these have not foundmainstream acceptance. High-Performance For-tran (HPF) is the most popular multiprocessingderivative of Fortran, but it is mostly geared to-ward distributed-memory systems.Independent software developers of scientiﬁcapplications, as well as government laboratories,have a large volume of Fortran 77 code thatneeds to get parallelized in a portable fashion.The rapid and widespread acceptance of shared-memory multiprocessor architectures—fromthe desktop to “glass houses”—has created apressing demand for a portable way to programthese systems. Developers need to parallelize ex-isting code without completely rewriting it, butthis is not possible with most existing parallel-language standards. Only OpenMP and X3H5allow incremental parallelization of existingcode, of which only OpenMP is scalable (seeTable 1). OpenMP is targeted at developers whoneed to quickly parallelize existing scientificcode, but it remains ﬂexible enough to support amuch broader application set. OpenMP pro-vides an incremental path for parallel conver-sion of any existing software. It also providesscalability and performance for a complete re-write or entirely new development.What is OpenMP?At its most elemental level, OpenMP is a set ofcompiler directives and callable runtime libraryroutines that extend Fortran (and separately, Cand C++) to express shared-memory parallelism.It leaves the base language unspeciﬁed, and ven-dors can implement OpenMP in any Fortrancompiler. Naturally, to support pointers and al-locatables,

View Full Document