DOC PREVIEW
UW-Madison CS 739 - Software Coherent Shared Memory on a Clustered Remote-Write Network

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Cashmere-2L: Software Coherent Shared Memory on a Clustered Remote-Write Network* Robert Stets, Sandhya Dwarkadas, Nikolaos Hardavellas, Galen Hunt, Leonidas Kontothanassis! Srinivasan Parthasarathy, and Michael Scott Department of Computer Science t DEC Cambridge Research Lab University of Rochester One Kendall Sq., Bldg. 700 Rochester, NY 14627-0226 Cambridge, MA 02139 [email protected] Abstract Low-latency remote-write networks, such as DEC’s Memory Chan- nel, provide the possibility of transparent, inexpensive, huge-scale shared-memory parallel computing on clusters of shared memory multiprocessors (SMPs). The challenge is to take advantage of hardwaresharedmemoryfor sharing within an SMI: and to ensure that software overheadis incurredonly when actively sharing data across SMPs in the cluster. In this paper, we describe a ‘Ywo- level” software coherent shared memory system-Cashmere-2L- that meets this challenge. CashmereSL uses hardware to share memory within a node, while exploiting the Memory Channel’s remote-write capabilities to implement “moderately lazy” release consistency with multiple concurrent writers, directories, home nodes, and page-size coherence blocks across nodes. Cashmere- 2L employs a novel coherence protocol that allows a high level of asynchrony by eliminating global directory locks and the needfor TLB shootdown. Remote interrupts are minimized by exploiting the remote-write capabilities of the Memory Channel network Cashmere-2L currently runs on an &node, 32-processor DEC AlphaServersystem. Speedups rangefrom 8 to 31 on 32processors for our benchmark suite, depending on the application’s charac- teristics. We quanhfi the importance of ourprotocol optimizations by comparing perjormance to that of several alternative protocols that do not share memory in hardware within an SMP, and require more synchronization. In comparison to a one-level protocol that does not share memory in hardware within an SMP Cashmere-2L improves performance by up to 46%. *This work was supported in part by NSF grants CDA-9401142, CCR-9319445, CCR-9409120, CCR-9702466, CCR-9705594. and CCR- 9510173; ARPA contract F19628-94-C-0057; an external research grant, from Digital Equipment Corporation; and a graduate fellowship from Mi- crosoft Research (Galen Hunt). IWmlSSlOn to maKe digital/hard copy ot part or all this work for personai or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advan- tage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. TO COPY otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. SOSP-16 10197 Saint-Malo, France @ 1997 ACM 0-89791-916-5/97/0010...$3.50 1 Introduction The shared-memory programming model provides ease-of-use for parallel applications. Unfortunately, while small-scale hnrd- ware cache-coherent “symmetric” multiprocessors (SMPs) arc now widely available in the market, larger hardware-coherent machines are typically very expensive. Software techniques based on virtunl memory have been used to support a shared memory programming model on a network of commodity workstations [3,6, 12, 14, 171, In general, however, the high latencies of traditional networks have resulted in poor performance relative to hardware shared memory for applications requiring frequent communication. Recent technological advances are changing the equation, LOW- latency remote-write networks, such as DEC’s Memory Chnn- nel [ll], provide the possibility of transparent and inexpcnsivo shared memory. These networks allow processors in one nodo to modify the memory of another node safely from user space, with very low (microsecond) latency. Given economies of scale, a “clustered” system of small-scale SMPs on a low-latency network is becoming a highly attractive platform for large, shared-memory parallel programs, particularly in organizations that already own tho hardware. SMP nodes reduce the fraction of coherence operations that must be handled in software. A low-latency network rcduccs thetimethattheprogrammustwaitforthoseoperations to complete, While software shared memory has been an active area of rc- search for many years, it is only recently that protocols for clustered systems have begun to be developed [7, 10,13,22]. The challenge for such a system is to take advantage of hardware shared memory for sharing within an SMP, and to ensure that software overhead is incurred only when actively sharing data across SMPs in the clus- ter. This challenge is non-trivial: the straightfotward “two-level” approach (arrange for each SMP node of a clustered system to play the role of a single processor in a non-clustered system) suffers from a serious problem: it requires the processors within a node lo synchronizevery frequently, e.g. every time one of them exchanges coherence information with another node. Our Cashmere-2L system is designed to capitalize on both intra- node cache coherence and low-latency inter-node messages, All processorsonanodesharethesamephysicalframe for asharcddata page. We employ a “moderately lazy” VM-based implementation of release consistency, with multiple concurrent writers, directories, home nodes, and page-size coherence blocks. Updates by multipla writers are propagated to the home node using dt$ [6]. Cashmerc- 2L exploits the capabilities of a low-latency remote-write network 170to apply these outgoing diffs without remote assistance, and to implementlow-costdirectories,notificationqueues,andapplication locks and barriers. Cashmere-2L solves the problem of excess synchronization due to protocol operations within a node with a novel technique called hvo-way dl@ng: it uses hvins (pristine page copies) and d@ (com- parisons of pristine and dirty copies) not only to identify local changes that must be propagated to the home node (outgoing diffs), but also to identify remote changes that must be applied to local memory (incoming diffs). The


View Full Document

UW-Madison CS 739 - Software Coherent Shared Memory on a Clustered Remote-Write Network

Download Software Coherent Shared Memory on a Clustered Remote-Write Network
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Software Coherent Shared Memory on a Clustered Remote-Write Network and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Software Coherent Shared Memory on a Clustered Remote-Write Network 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?