DOC PREVIEW
CMU CS 15740 - Memory Consistency Models for Shared-Memory Multiprocessors

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Memory Consistency Modelsfor Shared-Memory MultiprocessorsMotivation• Build large-scale shared-memory multiprocessors• High memory latency is a fundamental issue- Stanford DASH: 100-150 cycles- Kendall Square Research KSR1: 200-600 cycles• Caches reduce latency, but inherent communication remainsProcessorCacheMemoryProcessorCacheMemoryInterconnection NetworkHiding Memory Latency• Overlap memory accesses with other accesses and computation• Simple in uniprocessors• Can affect correctness in multiprocessorswrite Aread Bwrite Aread BOutline• Memory Consistency Models• Framework for Programming Simplicity• Performance Evaluation• ConclusionsUniprocessor Memory Model• Memory model specifies ordering constraints among accesses• Uniprocessor model: memory accesses atomic and in program order• Not necessary to maintain sequential order for correctness- hardware: buffering, pipelining- compiler: register allocation, code motion• Simple for programmers• Allows for high performancewrite Awrite Bread Aread BShared-Memory Multiprocessors• Order between accesses to different locations becomes importantA = 1;Flag = 1;wait (Flag == 1);... = A;P1P2How Unsafe Reordering Can Happen• Distribution of memory resources- accesses issued in order may be observed out of orderInterconnection NetworkProcessorMemoryA = 1;Flag = 1;Memorywait (Flag == 1);... = A;Flag: 0 1A: 0MemoryA = 1Flag = 1Processor Processor12Caches Complicate Things More• Multiple copies of the same locationA = 1;P1B = 1;P2 P3Interconnection NetworkProcessorMemoryMemoryA = 1ProcessorCacheMemoryProcessorCacheCacheA = 1B = 1A: 011B: 0A: 0wait (A==1);wait (B==1);... = A;Need for a Multiprocessor Memory Model• Provide reasonable ordering constraints on memory accesses- affects programmers- affects system designersMemory BehaviorWhat should the semantics be for memory operations to the shared memory?• ease-of-use: keep it similar to serial semantics for uniprocessor• operating system community used concurrent programming:• multiple processes interleaved on a single processor• Lamport (1979) formalized Sequential Consistency (SC):• “... the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.”Sequential Consistency• Formalized by Lamport- accesses of each processor in program order- all accesses appear in sequential order• Any order implicitly assumed by programmer is maintainedMemoryP1 P2 PnExample with SCSimple Synchronization:P1 P2A = 1 (a)Flag =1 (b) x = Flag (c)y = A (d)• all locations are initialized to 0• possible outcomes for (x,y): (0,0), (0,1), (1,1)• (x,y) = (1,0) is not a possible outcome:• we know a->b and c->d by program order• b->c implies that a->d• y==0 implies d->a which leads to a contradictionAnother Example with SCFrom Dekker’s Algorithm:P1 P2A = 1 (a) B = 1 (c)x = B (b) y = A (d)• possible outcomes for (x,y): (0,1), (1,0), (1,1)• outcome (x,y) = (0,0) not possible:• a->b and c->d implied by program order• x = 0 implies b->c which implies a->d• a->d says y = 1 which leads to a contradiction• similarly, y = 0 implies x =1 which is also a contradictionHow to Guarantee SCSufficient Conditions for SC (Dubois et al., 1987):• assumes general cache coherence (if we have caches):• writes to the same location are observed in same order by all P’s• for each processor, delay issue of access until previous completes• a read completes when the return value is back• a write completes when new value is visible to all processors• for simplicity, we assume writes are atomicImportant to note that these are not necessary conditions for maintaining SCSimple Bus-Based Multiprocessor• assume write-back caches• general cache coherence maintained by serialization at bus• writes to same location serialized and observed in the same order by all• writes are atomic because all processors observe the write at the same time• accesses from a single process complete in program order:• cache is busy while servicing a miss, effectively delaying later access• SC is guaranteed without any extra mechanism above coherenceP1 PnCacheCacheMemExample of Complication in Bus-Based Machines• 1st level cache write-thru, 2nd level write-back (e.g.,SGI cluster in DASH)• write-buffer with no forwarding (reads to 2nd level delayed until buffer empty)• never hit in the 1st level cache: SC is maintained (same as previous slide)• read hits in the first level cache cause complication (e.g., Dekker’s algorithm)• to maintain SC, we need to delay access to 1st level until there are no writes pending in write buffer (full write latency observed by processor)• multiprocessors may not maintain SC to achieve higher performanceMemL1 CacheL2 CacheL1 CacheL2 Cachewrite-buffer write-bufferP1PnScalable Shared-Memory Multiprocessor• no more bus to serialize accesses• only order maintained by network is point-to-point• general cache coherence:• serialize at memory location; point-to-point order required• accesses issued in order do not necessarily complete in order:• due to distribution of memory and varied-length paths in network• writes are inherently non-atomic:• new value is visible to some while others can still see old value• no one point in the system where a write is completedP1CacheMemPnCacheMemInterconnection NetworkScalable Architectures (Cont’d)• need to know when a write completes:• for providing atomicity• for delaying an access until previous one completes• requires acknowledgement messages:• write is complete when all invalidations are acknowledged• use a counter to count the number of acknowledgements• ensuring atomicity for writes:• delay access to new value until all acknowledgments are back• can be done for invalidation-based schemes; unnatural for updates• ensuring order of accesses from a processor:• delay each access until the previous one completes• latencies are large (10’s to 100’s of cycles) and all latency seen by processorP1CacheMemPnCacheMemInterconnection NetworkSummary for Sequential Consistency• Maintain order between shared accesses in each process• Severely restricts common hardware and compiler optimizationsREADREADREADWRITEWRITEREADWRITEWRITEAlternatives to


View Full Document

CMU CS 15740 - Memory Consistency Models for Shared-Memory Multiprocessors

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Memory Consistency Models for Shared-Memory Multiprocessors
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Memory Consistency Models for Shared-Memory Multiprocessors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Memory Consistency Models for Shared-Memory Multiprocessors 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?