DOC PREVIEW
UW-Madison CS 739 - Towards Transparent and Efficient Software Distributed Shared Memory

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Towards Transparent and Efficient Software Distributed Shared Memory Daniel J. Scales and Kourosh Gharachorloo Western Research Laboratory Digital Equipment Corporation {scales,kourosh} @pa.dec,com Abstract Despite a large research effort, software distributed shared mem- ory systems have not been widely used to run parallel applications across clusters of computers. The higher performance of hardware multiprocessors makes them the preferred platform for develop- ing and executing applications. In addition, most applications are distributed only in binary format for a handful of popular hardware systems. Due to their limited functionality, software systems cannot directly execute the applications developed for hardware platforms. We have developed a system called Shasta that attempts to address the issues of efficiency and transparency that have hindered wider acceptanceof software systems. Shastais adistributedsharedmem- ory system that supports coherence at a fine granularity in software and can efficiently exploit small-scale SMP nodes by allowing pro- cesses on the same node to share data at hardware speeds. This paper focuses on our goal of tapping into large classes of commercially available applications by transparently executing the same binaries thatrun on hardware platforms. We diicussthe issues involved in achieving transparent execution of binaries, which in- clude supporting the full instruction set architecture, implementing an appropriate memory consistency model, and extending OS ser- vices across separatenodes. We also describe the techniquesusedin Shastatosolvetheabovepmblems. TheShastasystemisfullyfunc- tional on a prototype cluster of Alpha multiprocessors connected through Digital’s Memory Channel network and can transparently run parallel applications on the cluster that were compiled to run on a single shared-memory multiprocessor. As an example of Shasta’s flexibility, it can execute Oracle 7.3, a commercial database engine, across the cluster, including workloads modeled after the TPC-B and TPC-D database benchmarks. To characterize the performance of the system and the cost of providing complete transparency, we present performance results for microbenchmarks and applications running on the cluster, include preliminary results for Oracle runs. 1 Introduction There has beenmuch research on supportiug a shared address space in software across a cluster of workstations or servers. A variety of such distributed shared memory (DSM) systems have been devel- oped, using various techniques to miniie the software overhead permission to make digital/hard copy of part or all this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advan- tage, the copyright notice, the title of the publication and its date appear, end notice is given that copying is by permission of ACM. Inc. To copy otherwise, to republish, to post on servers. or to redistribute to fists, requires prior specific permission and/or e fee. SOSP-16 IO/97 Saint-Malo, France @ 1997 ACM o-89791~916-5197/0010...$3.50 for supporting the shared address space. The most common ap- proach uses the virtual memory hardware to detect access to data that is not available locally [2,7,9,19]. These systems communi- cate data and maintain coherence at a fixed granularity equal to the size of a virtual page. Despite all the research on software DSM systems, software platforms have yet to make an impact on mainstream computing. At the same time, hardware shared memory systems have gained wide acceptance. The higher performance of hardware systems makes them the preferred platform for application development. Software vendors typically distribute applications in binary format only for a few popular hardware p1atform.s. Due to their limited function- ality, software systems cannot directly execute these applications, and thus fail to capitalize on the increasing number of applications available for hardware systems. For example, software systems typically require the use of special constructs for synchronization and task creation and severely limit the use of system calls across the cluster. We have attempted to address some of the above issues of efficiency and transparency in the Shasta system [14]. Shasta is a software DSM system that supports sharing of data at a fine gran- ularity by inserting code in au application executable that checks if data being accessed by a load or store is available locally in the appropriate state. This paper focuses on the issues in transparently executing hardware binaries in the context of the Shasta system. Transparent execution of binaries encompasses several chal- lenging problems which fall into two broad categories, correctly supporting the complete instruction set architecture and extending OS services across separate nodes. As an example in the instruc- tion set category, software systems have to directly support atomic read-modify-write instructions as opposed to depending on special high-level synchronization constructs (as is done in virtually all cur- rent software DSM systems). Software systems must also correctly support the memory consistency model specified by a given in- struction set architecture. Much of the recent research on software DSM systems involves protocol innovations related to exploiting or further relaxing the memory consistency model to solve false sharing problems that arise from page-level coherence. However, many important commercial architectures, including the Intel x86 architecture, support rather strict memory consistency models that disallow virtually all the critical performance optimizations that are usedin such page-basedDSM systems. Evenarchitectures that sup- port aggressive relaxed models (i.e., Alpha, PowerPC, and Sparc) fail to provide suf&zient information (in the executable) to allow many of the optimizations based on release consistency that are ex- ploited by several of the software systems [l, 7j. Furthermore, the above issues related to hardware memory consistency models are unlikely to change in the foreseeable future. Transparently executing applications that useOS


View Full Document

UW-Madison CS 739 - Towards Transparent and Efficient Software Distributed Shared Memory

Download Towards Transparent and Efficient Software Distributed Shared Memory
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Towards Transparent and Efficient Software Distributed Shared Memory and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Towards Transparent and Efficient Software Distributed Shared Memory 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?