Unformatted text preview:

Memory Consistency Models Adam Wierman Daniel Neill Adve Pai and Ranganathan Recent advances in memory consistency models for hardware shared memory systems 1999 Gniady Falsafi and Vijaykumar Is SC ILP RC 1999 Hill Multiprocessors should support simple memory consistency models 1998 Carnegie Mellon School of Computer Science Architecture 1 Memory consistency models The memory consistency model of a shared memory system determines the order in which memory operations will appear to execute to the programmer Processor 1 writes to some memory location Processor 2 reads from that location Do I get the result I expect Different models make different guarantees the processor can reorder overlap memory operations as long as the guarantees are upheld Tradeoff between programmability and performance Carnegie Mellon School of Computer Science Architecture 2 Code example 1 initially Data1 Data2 Flag 0 P1 P2 Data1 64 Data2 55 Flag 1 while Flag 1 register1 Data1 register2 Data2 What should happen Carnegie Mellon School of Computer Science Architecture 3 Code example 1 initially Data1 Data2 Flag 0 P1 P2 Data1 64 Data2 55 Flag 1 while Flag 1 register1 Data1 register2 Data2 What could go wrong Carnegie Mellon School of Computer Science Architecture 4 Three models of memory consistency Sequential Consistency SC Memory operations appear to execute one at a time in some sequential order The operations of each individual processor appear to execute in program order Processor Consistency PC Allows reads following a write to execute out of program order if they re not reading writing the same address Writes may not be immediately visible to other processors but become visible in program order Release Consistency RC All reads and writes to different addresses are allowed to operate out of program order Carnegie Mellon School of Computer Science Architecture 5 Code example 1 initially Data1 Data2 Flag 0 P1 P2 Data1 64 Data2 55 Flag 1 while Flag 1 register1 Data1 register2 Data2 Does it work under SC no relaxation PC Write Read relaxation RC all relaxations Carnegie Mellon School of Computer Science Architecture 6 Code example 2 initially Flag1 Flag2 0 P1 Flag1 1 register1 Flag2 if register1 0 critical section P2 Flag2 1 register2 Flag1 if register2 0 critical section What should happen Carnegie Mellon School of Computer Science Architecture 7 Code example 2 initially Flag1 Flag2 0 P1 Flag1 1 register1 Flag2 if register1 0 critical section P2 Flag2 1 register2 Flag1 if register2 0 critical section What could go wrong Carnegie Mellon School of Computer Science Architecture 8 Code example 2 initially Flag1 Flag2 0 P1 Flag1 1 register1 Flag2 if register1 0 critical section P2 Flag2 1 register2 Flag1 if register2 0 critical section Does it work under SC no relaxation PC Write Read relaxation RC all relaxations Carnegie Mellon School of Computer Science Architecture 9 The performance programmability tradeoff Increasing performance Increasing programmability Carnegie Mellon School of Computer Science Architecture 10 Programming difficulty PC RC include special synchronization operations to allow specific instructions to execute atomically and in program order The programmer must identify conflicting memory operations and ensure that they are properly synchronized Missing or incorrect synchronization program gives unexpected incorrect results Too many unnecessary synchronizations performance reduced no better than SC Idea normally ensure sequential consistency allow programmer to specify when relaxation possible Carnegie Mellon School of Computer Science Architecture 11 Code example 1 revisited initially Data1 Data2 Flag 0 P1 P2 Data1 64 Data2 55 while Flag 1 MEMBAR ST ST Flag 1 MEMBAR LD LD register1 Data1 register2 Data2 Programmer adds synchronization commands and now it works as expected Carnegie Mellon School of Computer Science Architecture 12 Performance of memory consistency models Relaxed memory models PC RC hide much of memory operations long latencies by reordering and overlapping some or all memory operations PC RC can use write buffering RC can be aggressively out of order This is particularly important When cache performance poor resulting in many memory operations In distributed shared memory systems when remote memory accesses may take much longer than local memory accesses Performance results for straightforward implementations as compared to SC PC and RC reduce execution time by 23 and 46 respectively Adve et al Carnegie Mellon School of Computer Science Architecture 13 The big question How can SC approach the performance of RC Carnegie Mellon School of Computer Science Architecture 14 How can SC approach RC 2 Techniques Hardware Optimizations Carnegie Mellon School of Computer Science Compiler Optimizations Architecture 15 What can SC do Can SC have per processor caches YES Hardware Optimizations Can SC have non binding prefetching YES YES Can SC have multithreading NO Can SC use a write buffer SC cannot reorder memory operations because it might cause inconsistency Carnegie Mellon School of Computer Science Architecture 16 Speculation with SC SC only needs to appear to do memory operations in order Hardware Optimizations 1 2 Speculatively perform all memory operations Roll back to sequentially consistent state if constraints are violated This emulates RC as long as rollbacks are infrequent Carnegie Mellon School of Computer Science Architecture 17 Speculation with SC SC only needs to appear to do memory operations in order Hardware Optimizations 1 2 Speculatively perform all memory operations Roll back to sequentially consistent state if constraints are violated Must allow both loads and stores to bypass each other Needs a very large speculative state Don t introduce overhead to the pipeline Carnegie Mellon School of Computer Science Architecture 18 Speculation with SC SC only needs to appear to do memory operations in order Hardware Optimizations 1 2 Speculatively perform all memory operations Roll back to sequentially consistent state if constraints are violated Must detect violations quickly Must be able to roll back quickly Rollbacks can t happen often Carnegie Mellon School of Computer Science Architecture 19 Results SC only needs to appear to do memory operations in order Hardware Optimizations These changes were implemented in SC and results showed a narrowing gap as compared to PC and RC The gap is negligible Unlimited SHiQ BLT but SC used significantly more hardware Carnegie


View Full Document

CMU CS 15740 - Memory Consistency Models

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Loading Unlocking...
Login

Join to view Memory Consistency Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Memory Consistency Models and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?