UT CS 395T - Memory Consistency Models - D2974836

Home> Schools> University of Texas at Austin> Computer Science (CS) > CS 395T> Memory Consistency Models

DOC PREVIEW

UT CS 395T - Memory Consistency Models

School name University of Texas at Austin

Course Cs 395t- Multicore Operating Systems Implementation

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Memory Consistency ModelsOutline• Review of multi-threaded program execution on uniprocessor• Need for memory consistency models• Sequential consistency model• Relaxed memory models – weak consistency model– release consistency model• ConclusionsMulti-threaded programs on uniprocessorMEMORYP• Processor executes all threads of program– unspecified scheduling policy• Operations in each thread are executed in order• Atomic operations: lock/unlock etc. for synchronization between threads• Result is as if instructions from different threads were interleaved in some order• Non-determinacy: program may produce different outputs depending on scheduling of threads (eg)Thread 1 Thread 2….. ……x := 1; print(x);x := 2;Multi-threaded programs on multiprocessorMEMORYP• Each processor executes one thread– let’s keep it simple• Operations in each thread are executed in order• One processor at a time can access global memory to perform load/store/atomic operations– no caching of global data• You can show that running multi-threaded program on multiple processors does not change possible output(s) of program from uniprocessorcasePP2More realistic architecture• Two key assumptions so far:1. processors do not cache global data• improving execution efficiency:– allow processors to cache global data» leads to cache coherence problem, which can be solved using coherent caches as explained before2. instructions within each thread are executed in order• improving execution efficiency:– allow processors to execute instructions out of order subject todata/control dependences» surprisingly, this can change the semantics of the program» preventing this requires attention to memory consistency model of processorRecall: uniprocessor execution• Processors reorder operations to improve performance• Constraint on reordering: must respect dependences– data dependences must be respected: in particular, loads/stores to a given memory address must be executed in program order– control dependences must be respected• Reorderings can be performed either by compiler or processorPermitted memory-op reorderings• Stores to different memory locations can be performed out of program orderstore v1, data store b1, flagstore b1, flag ÅÆ store v1, data• Loads from different memory locations can be performed out of program orderload flag, r1 load data,r2load data, r2 ÅÆ load flag, r1• Load and store to different memory locations can be performed out of program orderExample of hardware reorderingMemory systemProcessorStore bufferLoad bypassing• Store buffer holds store operations that need to be sent to memory• Loads are higher priority operations than stores since their results areneeded to keep processor busy, so they bypass the store buffer• Load address is checked against addresses in store buffer, so storebuffer satisfies load if there is an address match• Result: load can bypass stores to other addresses3Problem in multiprocessor context• Canonical model– operations from given processor are executed in program order – memory operations from different processors appear to be interleaved in some order at the memory• Question:– If a processor is allowed to reorder independent operations in its own instruction stream, will the execution always produce the same results as the canonical model?– Answer: no. Let us look at some examples. Example (I) Code:Initially A = Flag = 0P1 P2A = 23; while (Flag != 1) {;} Flag = 1; ... = A; Idea: – P1 writes data into A and sets Flag to tell P2 that data value can be read from A. – P2 waits till Flag is set and then reads data from A.Execution Sequence for (I)Code:Initially A = Flag = 0P1 P2A = 23; while (Flag != 1) {;} Flag = 1; ... = A; Possible execution sequence on each processor:P1 P2 Write A 23 Read Flag //get 0 Write Flag 1 ……Read Flag //get 1 Read A //what do you get?Problem: If the two writes on processor P1 can be reordered, it is possible for processor P2 to read 0 from variable A. Can happen on most modern processors.Example IICode: (like Dekker’s algorithm)Initially Flag1 = Flag2 = 0P1 P2Flag1 = 1; Flag2 = 1;If (Flag2 == 0) If (Flag1 == 0) critical section critical sectionPossible execution sequence on each processor:P1 P2 Write Flag1, 1 Write Flag2, 1 Read Flag2 //get 0 Read Flag1 //what do you get?4Execution sequence for (II)Code: (like Dekker’s algorithm)Initially Flag1 = Flag2 = 0P1 P2Flag1 = 1; Flag2 = 1;If (Flag2 == 0) If (Flag1 == 0) critical section critical sectionPossible execution sequence on each processor:P1 P2 Write Flag1, 1 Write Flag2, 1 Read Flag2 //get 0 Read Flag1, ?? Most people would say that P2 will read 1 as the value of Flag1.Since P1 reads 0 as the value of Flag2, P1’s read of Flag2 must happen before P2 writes to Flag2. Intuitively, we would expect P1’s write of Flag to happen before P2’s read of Flag1.However, this is true only if reads and writes on the same processor to different locations are not reordered by the compiler or the hardware.Unfortunately, this is very common on most processors (store-buffers with load-bypassing).Lessons• Uniprocessors can reorder instructions subject only to control and data dependence constraints• These constraints are not sufficient in shared-memory context– simple parallel programs may produce counter-intuitive results• Question: what constraints must we put on uniprocessorinstruction reordering so that– shared-memory programming is intuitive– but we do not lose uniprocessor performance?• Many answers to this question– answer is called memory consistency modelsupported by the processorConsistency models- Consistency models are not about memory operations from different processors.- Consistency models are not about dependent memory operations in a single processor’s instruction stream (these are respected even by processors that reorder instructions).- Consistency models are all about ordering constraints on independent memory operations in a single processor’sinstruction stream that have some high-level dependence (such as flags guarding data) that should be respected to obtain intuitively reasonable results.Simplest Memory Consistency Model• Sequential consistency (SC)

View Full Document

UT CS 395T - Memory Consistency Models

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 7 pages.

UT CS 395T - Memory Consistency Models

Sign up for free to view:

Please select your school