DOC PREVIEW
Berkeley COMPSCI 258 - An Adaptive Cache Coherence Protocol

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

An Adaptive Cache Coherence ProtocolOptimized for Migratory SharingPer Stenstrom, Mats Brorsson, and Lars SandbergDepartment of Computer Engineering, Lund UniversityP.O. Box 118, S-221 00 LUND, SwedenAbstractParallel programs that use critical sections and areexecuted on a shared-memory multiprocessor with a write-invalidate protocol result in invalidation actions thatcould be eliminated. For this type of sharing, calledm“gratory sharing, each processor typically causes acache miss followed by an invalidation request whichcould be merged with the preceding cache-muss request.In this paper we propose an adaptive protocol thatinvokes this optimization dynamically for n’gratoryblocks. For other blocks, the protocol works as an ordi-nary write-invalidate protocol. We show that the protocolis a simple extension to a write-invalidate protocol.Based on a program-driven simulation model of anarchitecture sim”lar to the Stanford DASH, and a set offour benchmarks, we evaluate the potential performanceimprovements of the protocol. We jind that it effectivelyeliminates most single invalidations which improves theperformance by reducing the shared access penalty andthe network trajic.1 IntroductionIn order for shared-memory multiprocessors to achieve ahigh performance, memory system tatency and contentionmust be kept as low as possible. A viable solution to thisproblem has been to attach private caches to each process-ing node and maintain cache coherence using a directory-based write-invalidate protocol. Notable examples of realimplementations of large-scale multiprocessors thatexploit this technique are the Stanford DASH [12], theMIT Alewife [1], the SICS Data Diffusion Machine (theDDM) [9], and the Kendall Square Research’s KSR1 [2],Write-invalidate protocols maintain cache coherence byinvalidating copies of a memory block when the block ismodified by a processor. The advantage of this is that atmost the first write, in a sequence of writes to the sameblock with no intervening read operations from other proc-essors, causes global interaction. Consequently, write-invalidate protocols perform fairly well for a broad rangeof sharing patterns. However, there exist common sharingpatterns for which atl invalidation~ could have beenentirely avoided. A notable example is the invalidationoverhead associated with data structures that are accessedwithin critical sections. Typically, processors read andmodify such &ta structures one at a time. Processors thataccess data this way cause a cache miss followed by aninvalidation request being sent to the cache attached to theprocessor that most recently exited the critical section. Ifthe cache coherence protocol were aware of this sharingpattern, it would be possible to merge the invalidationrequest with the preceding read-miss request and thuseliminate all explicit invalidation actions. This sharingbehavior, denoted migratory sharing, has been previouslyshown to be the major source of single invalidations byGupta and Weber in [8].Eliminating invalidation requests can help performancein many important ways. First, if access requests cannotoverlap invalidation requests due to memory consistencymodel or implementation constraints [6], the access pen-alty is redueed by reducing the number of global invalida-tion requests. Second, the network traffic is reducedwhich, as a secondary effect, may reduce the read andwrite penalty due to less network contention. Conse-quently, eliminating the number of invalidation requestsmay improve the performance significantly.In this paper, we propose an implementation of anadaptive write-invalidate protocol that effectively elimi-nates most invalidation requests associated with migratorysharing. The protocol dynamically detects whether a mem-ory block exhibits migratory sharing or not. For blocksdeemed migratory, the invalidation request is merged withthe preeeding read-miss request and for other blocks, itmaintains coherence according to the default write-invali-date policy. In addition, the protocol can dynamically, on aper block basis, switch between these operating modes,would the block change sharing behavior. As a case-study,we show that our protocol is a simple extension of a wnte-invalidate protocol by presenting the modifications neededfor a state-of-the-art write-invalidate protocol, in essencethe directory-based protocol of the Stanford DASH [11].To validate the correctness of the protocol and evaluateits performance, we have implemented and evaluated itusing a detailed program-driven simulation model of aDASH-like architecture and a set of four benchmarks, ofwhich three are taken from the SPLASH suite [14]. Wehave found that by eliminating the invalidation requests to0884-7495/93 $3.0001993 IEEE109migratory blocks, performance can be improved due toless access penalty and nehvork trafftc. We show that thesefactors can improve the performance significantly forhigh-performance multiprocessors.The organization of the rest of the paper is as follows.Since the adaptive protocol can be applied to most write-invalidate protocols, we provide in Section 2 a high-leveldescription of the coherence optimization it provides andhow it detects migratory sharing. As a base for our imple-mentation and performance study, we use the StanfordDASH protocol. In Section 3, we describe how the DASHprotocol can be extended to adaptively detect and optimizemigratory sharing. Sections 4 and 5 present the experi-mental results starting with the architectural assumptionsand the benchmarks in Section 4. We finally generalize ourfindings in Section 6 and conclude the paper in Section 7.2 The Adaptive Protocol: A High-Level ViewIn this section, we first identify the type of migratory shar-ing that incurs invalidation overhead in write-invalidateprotocols in Section 2.1. Then in Section 2.2, we present ahigh-level description of an a&ptive protocol that detectssuch sharing and eliminates its overhead.2.1 Migratory SharingGupta and Weber classify data structures based on theinvalidation pattern they exhibit [8]. According to theirdefinition, data structures manipulated by only a singleprocessor at any given time are called m“gratory objects.Typically, such sharing occurs when a data structure ismodified within a critical section. Protecting modificationsof shared objects by locks is important to eliminate dataraces. Therefore, many parallel languages, such as e.g.Modula-2 and Ada, use critical sections (or monitors) asthe recommended


View Full Document

Berkeley COMPSCI 258 - An Adaptive Cache Coherence Protocol

Documents in this Course
Load more
Download An Adaptive Cache Coherence Protocol
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view An Adaptive Cache Coherence Protocol and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view An Adaptive Cache Coherence Protocol 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?