Berkeley COMPSCI 258 - An Adaptive Cache Coherence Protocol - D1917692

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 258> An Adaptive Cache Coherence Protocol

DOC PREVIEW

Berkeley COMPSCI 258 - An Adaptive Cache Coherence Protocol

School name University of California, Berkeley

Course Compsci 258- Parallel Processors

Pages 10

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

An Adaptive Cache Coherence ProtocolOptimized for Migratory SharingPer Stenstrom, Mats Brorsson, and Lars SandbergDepartment of Computer Engineering, Lund UniversityP.O. Box 118, S-221 00 LUND, SwedenAbstractParallel programs that use critical sections and areexecuted on a shared-memory multiprocessor with a write-invalidate protocol result in invalidation actions thatcould be eliminated. For this type of sharing, calledm“gratory sharing, each processor typically causes acache miss followed by an invalidation request whichcould be merged with the preceding cache-muss request.In this paper we propose an adaptive protocol thatinvokes this optimization dynamically for n’gratoryblocks. For other blocks, the protocol works as an ordi-nary write-invalidate protocol. We show that the protocolis a simple extension to a write-invalidate protocol.Based on a program-driven simulation model of anarchitecture sim”lar to the Stanford DASH, and a set offour benchmarks, we evaluate the potential performanceimprovements of the protocol. We jind that it effectivelyeliminates most single invalidations which improves theperformance by reducing the shared access penalty andthe network trajic.1 IntroductionIn order for shared-memory multiprocessors to achieve ahigh performance, memory system tatency and contentionmust be kept as low as possible. A viable solution to thisproblem has been to attach private caches to each process-ing node and maintain cache coherence using a directory-based write-invalidate protocol. Notable examples of realimplementations of large-scale multiprocessors thatexploit this technique are the Stanford DASH [12], theMIT Alewife [1], the SICS Data Diffusion Machine (theDDM) [9], and the Kendall Square Research’s KSR1 [2],Write-invalidate protocols maintain cache coherence byinvalidating copies of a memory block when the block ismodified by a processor. The advantage of this is that atmost the first write, in a sequence of writes to the sameblock with no intervening read operations from other proc-essors, causes global interaction. Consequently, write-invalidate protocols perform fairly well for a broad rangeof sharing patterns. However, there exist common sharingpatterns for which atl invalidation~ could have beenentirely avoided. A notable example is the invalidationoverhead associated with data structures that are accessedwithin critical sections. Typically, processors read andmodify such &ta structures one at a time. Processors thataccess data this way cause a cache miss followed by aninvalidation request being sent to the cache attached to theprocessor that most recently exited the critical section. Ifthe cache coherence protocol were aware of this sharingpattern, it would be possible to merge the invalidationrequest with the preceding read-miss request and thuseliminate all explicit invalidation actions. This sharingbehavior, denoted migratory sharing, has been previouslyshown to be the major source of single invalidations byGupta and Weber in [8].Eliminating invalidation requests can help performancein many important ways. First, if access requests cannotoverlap invalidation requests due to memory consistencymodel or implementation constraints [6], the access pen-alty is redueed by reducing the number of global invalida-tion requests. Second, the network traffic is reducedwhich, as a secondary effect, may reduce the read andwrite penalty due to less network contention. Conse-quently, eliminating the number of invalidation requestsmay improve the performance significantly.In this paper, we propose an implementation of anadaptive write-invalidate protocol that effectively elimi-nates most invalidation requests associated with migratorysharing. The protocol dynamically detects whether a mem-ory block exhibits migratory sharing or not. For blocksdeemed migratory, the invalidation request is merged withthe preeeding read-miss request and for other blocks, itmaintains coherence according to the default write-invali-date policy. In addition, the protocol can dynamically, on aper block basis, switch between these operating modes,would the block change sharing behavior. As a case-study,we show that our protocol is a simple extension of a wnte-invalidate protocol by presenting the modifications neededfor a state-of-the-art write-invalidate protocol, in essencethe directory-based protocol of the Stanford DASH [11].To validate the correctness of the protocol and evaluateits performance, we have implemented and evaluated itusing a detailed program-driven simulation model of aDASH-like architecture and a set of four benchmarks, ofwhich three are taken from the SPLASH suite [14]. Wehave found that by eliminating the invalidation requests to0884-7495/93 $3.0001993 IEEE109migratory blocks, performance can be improved due toless access penalty and nehvork trafftc. We show that thesefactors can improve the performance significantly forhigh-performance multiprocessors.The organization of the rest of the paper is as follows.Since the adaptive protocol can be applied to most write-invalidate protocols, we provide in Section 2 a high-leveldescription of the coherence optimization it provides andhow it detects migratory sharing. As a base for our imple-mentation and performance study, we use the StanfordDASH protocol. In Section 3, we describe how the DASHprotocol can be extended to adaptively detect and optimizemigratory sharing. Sections 4 and 5 present the experi-mental results starting with the architectural assumptionsand the benchmarks in Section 4. We finally generalize ourfindings in Section 6 and conclude the paper in Section 7.2 The Adaptive Protocol: A High-Level ViewIn this section, we first identify the type of migratory shar-ing that incurs invalidation overhead in write-invalidateprotocols in Section 2.1. Then in Section 2.2, we present ahigh-level description of an a&ptive protocol that detectssuch sharing and eliminates its overhead.2.1 Migratory SharingGupta and Weber classify data structures based on theinvalidation pattern they exhibit [8]. According to theirdefinition, data structures manipulated by only a singleprocessor at any given time are called m“gratory objects.Typically, such sharing occurs when a data structure ismodified within a critical section. Protecting modificationsof shared objects by locks is important to eliminate dataraces. Therefore, many parallel languages, such as e.g.Modula-2 and Ada, use critical sections (or monitors) asthe recommended

View Full Document