Duke CPS 212 - Distributed Memory and Cache Consistency

Unformatted text preview:

Distributed Memory and Cache ConsistencyDistributed Memory and Cache Consistency(some slides courtesy of Alvin Lebeck)Software DSM 101Software DSM 101Software-based distributed shared memory (DSM) providesanillusionofsharedmemoryonacluster.• remote-fork the same program on each node• data resides in common virtual address spacelibrary/kernel collude to make the shared VAS appear consistent• The Great War: shared memory vs. message passingfor the full story, take CPS 221switched interconnectphysicalPage Based DSM (Shared Virtual Memory)Page Based DSM (Shared Virtual Memory)Virtual address space is sharedDRAMDRAMVirtual Address SpacephysicalInside PageInside Page--Based DSM (SVM)Based DSM (SVM)The page-based approach uses a write-ownership tokenprotocol on virtual memory pages.• Kai Li [Ivy, 1986], Paul Leach [Apollo, 1982]• System maintains per-node per-page access mode.{shared, exclusive, no-access}determines local accesses allowedmodes enforced with VM page protectionmode load storeshared yes noexclusive yes yesno-access no noWriteWrite--Ownership ProtocolOwnership ProtocolA write-ownership protocol guarantees that nodes observesequential consistency of memory accesses:• Any node with any access has the latest copy of the page.On any transition from no-access, fetch current copy of page.• A node with exclusive access holds the only copy.At most one node may hold a page in exclusive mode.On transition into exclusive, invalidate all remote copies and settheir mode to no-access.• Multiple nodes may hold a page in shared mode.Permits concurrent reads: every holder has the same data.On transition into shared mode, invalidate the exclusive remotecopy (if any), and set its mode to shared as well.Paged DSM/SVM ExamplePaged DSM/SVM ExampleP1 read virtual address x• Page fault• Allocate physical frame for page(x)• Request page(x) from home(x)• Set readable page(x)• ResumeP1 write virtual address x• Protection fault• Request exclusive ownership of page(x)• Set writeable page(x)• ResumeThe Sequential Consistency Memory ModelThe Sequential Consistency Memory ModelP1 P2P3switch randomly setafter each memory opensures some serialorder among all operationssequentialprocessorsissuememory opsin programorderMemoryEasily implemented with shared bus.Motivation for Weaker OrderingsMotivation for Weaker Orderings1. Sequential consistency is sufficient (but not necessary) forshared-memory parallel computations to execute correctly.2. Sequential consistency is slow for paged DSM systems.Processors cannot observe memory bus traffic in other nodes.Even if they could, no shared bus to serialize accesses.Protection granularity (pages) is too coarse.3. Basic problem: the need for exclusive access to cache lines(pages) leads to false sharing.Causes a “ping-pong effect” if multiple writers to the same page.4. Solution:allowmultiple writers to a page if their writes are“nonconflicting”.Weak OrderingWeak OrderingCareful access ordering only matters when data is shared.Shared data should be synchronized.Classify memory operations as data or synchronizationCan reorder data operations between synchronizationoperationsForces consistent view at all synchronization pointsVisible synchronization operation, can flush write buffer andobtain ACKS for all previous memory operationsCannot let synch operation complete until previous operationscomplete (e.g., ACK all invalidations)Weak Ordering ExampleWeak Ordering ExampleRead / Write…Read/WriteRead / Write…Read/WriteRead / Write…Read/WriteSynchSynchAB(x = y = 0;)if (y > x) loop {panic(“ouch”); x = x + 1;y=y+1;}Aacquire();if (y > x)panic(“ouch”);release();Bloop() {acquire();x=x+1;y=y+1;release();}Multiple Writer ProtocolMultiple Writer Protocolx&yonsamepageP1writesx,P2writesyDon’t want delays associated with constraint of exclusiveaccessAllow each processor to modify its local copy of a pagebetween synchronization pointsMake things consistent at synchronization pointTreadmarks 101Treadmarks 101Goal: implement the “laziest” software DSM system.• Eliminate false sharing by multiple-writer protocol.Capture page updates at a fine grain by “diffing”.Propagate just the modified bytes (deltas).Allows merging of concurrent nonconflicting updates.• Propagate updates only when needed, i.e., when programuses shared locks to force consistency.Assume program is fully synchronized.• lazy release consistency (LRC)A need not be aware of B’s updates except when needed topreserve potential causality......with respect to shared synchronization accesses.Lazy Release ConsistencyLazy Release ConsistencyPiggyback write notices with acquire operations.• guarantee updates are visible on acquirelazier than Munin, which propagates updates on release• implementation propagates invalidations rather than updatesP0P1P2acq w(x) relr(y)r(y)acqlockgrant+ inv.w(x)relacqlockgrant+ inv.r(x)relupdates to xupdates to xX&YonsamepageOrdering of Events in TreadmarksOrdering of Events in TreadmarksABCLRC is not linearizable: there is no fixed global orderingof events.There is a serializable partial order on synchronizationevents and the intervals they define.Acquire(x)Release(x)Release(x)Acquire(x)Release(x)Acquire(x)Vector Timestamps in TreadmarksVector Timestamps in TreadmarksTo maintain the partial order on intervals, each nodemaintains a current vector timestamp (CVT).• Intervals on each node are numbered 0, 1, 2...• CVT is a vector of length N, the number of nodes.• CVT[i] is number of the last preceding interval on node i.Vector timestamps are updated on lock acquire.• CVT is passed with lock acquire request...• compared with the holder’s CVT...• pairwise maximum CVT is returned with the lock.LRC ProtocolLRC ProtocolABCAcquire(x)Release(x)Acquire(x)Release(x)Acquire(x)receive writenoticesgeneratewrite noticesCVTBwrite noticesfor CVTA-CVTB(see text below)reference to page withpending write noticedeltarequestdeltalistwritenoticesWhat does A send to B on the red arrow (pass with lock token)?- a list of intervals known to A but not to B (CVTA-CVTB)- for each interval in the list:- the origin node i and interval number n- i’s vector clock CVTiduring that interval n on node i- a list of pages dirtied by i during that interval n- these dirty page notifications are called write noticesWrite NoticesWrite NoticesLRC requires that each node be aware of any updates to ashared page made during a preceding interval.• Updates are


View Full Document

Duke CPS 212 - Distributed Memory and Cache Consistency

Download Distributed Memory and Cache Consistency
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Distributed Memory and Cache Consistency and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Distributed Memory and Cache Consistency 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?