UH COSC 6385 - Multi-Processors (II) Synchronization

Unformatted text preview:

1Edgar GabrielCOSC 6385 Computer Architecture - Multi-Processors (II)SynchronizationEdgar GabrielFall 2009COSC 6385 – Computer ArchitectureEdgar GabrielSynchronization between processes• Required on all levels of multi-threaded programming– Lock/unlock– Mutual exclusion– Barrier synchronization• Key hardware capability:– Uninterruptable instruction capable of automatically retrieving or changing a value2COSC 6385 – Computer ArchitectureEdgar GabrielSynchronization • Lock/unlock operations– Lock returning 0 if lock is free/available– Lock returning 1 if lock is unavailable• Implementation using atomic exchange– Process sets the value of a register/memory location to the required operation– Setting the value must not be interrupted in order to avoid race conditions– Access by multiple processes/threads will be resolved by write serializationCOSC 6385 – Computer ArchitectureEdgar GabrielSynchronization (II)• Alternative implementations:– Test-and-set– Fetch-and-increment• Problems with all three algorithms:– Require a read and write operation in a single, uninterruptable sequence– Hardware can not allow any operations between the read and the write operation– Complicates cache coherence– Must not deadlock3COSC 6385 – Computer ArchitectureEdgar GabrielLoad linked/store conditional• Alternative implementations (II):– Pair of instructions where the second instruction returns a value indicating, whether the pair of instructions was executed as if the instructions were atomic– Special pair of load and store operations• Load linked (LL)• Store conditional (SC): returns 1 if successful, 0 otherwise– Store conditional returns an error if• Contents of memory location specified by LL changed before calling SC• Processor executes a context switchCOSC 6385 – Computer ArchitectureEdgar GabrielLoad linked/store conditional (II)• Assembler code sequence to atomically exchange the contents of register R4 and the memory location specified by R1try: MOV R3, R4LL R2, 0(R1)SC R3, 0(R1)BEQZ R3, tryMOV R4, R24COSC 6385 – Computer ArchitectureEdgar GabrielLoad linked/store conditional (III)• Implementing fetch-and-increment using load linked and conditional storetry: LL R2, 0(R1)DADDUI R3, R2, #1SC R3, 0(R1)BEQZ R3, try• Implementation of LL/SC by using a special Link Register, which contains the address of the operationCOSC 6385 – Computer ArchitectureEdgar GabrielSpin locks• A lock that a processor continuously tries to acquire, spinning around in a loop until it succeeds.• Trivial implementationDADDUI R2, R0, #1lockit: EXCH R2, 0(R1) !atomic exchangeBNEZ R2, lockit• Since the EXCH operation includes a read and a modify operation– Value will be loaded into the cache• Good if only one processor tries to access the lock• Bad if multiple processors in an SMP try to get the lock (cache coherence)– EXCH includes a write attempt, which will lead to a write-miss for SMPs5COSC 6385 – Computer ArchitectureEdgar GabrielSpin locks (II)• For cache coherent SMPs, slight modification of the loop requiredlockit: LD R2, 0(R1) !load the lockBNEZ R2, lockit !lock available?DADDUI R2, R0, #1 !load locked valueEXCH R2, 0(R1) !atomic exchangeBNEZ R2, lockit !EXCH successful?COSC 6385 – Computer ArchitectureEdgar GabrielSpin locks (III)• …or using LL/SClockit: LL R2, 0(R1) !load the lockBNEZ R2, lockit !lock available?DADDUI R2, R0, #1 !load locked valueSC R2, 0(R1) !atomic exchangeBNEZ R2, lockit !SC successful?6COSC 6385 – Computer ArchitectureEdgar GabrielMemory consistency• Cache coherence ensures that multiple processors see a same value for given memory location• Question remains regarding when an update operation is visible at another process• What happens in the scenario above if the code sequences run on different processors, but each processor has a cached copy of A and BP1: A = 0;…A = 1;L1: if ( B==0 )…P2: B = 0;…B = 1;L2: if ( A==0 )…COSC 6385 – Computer ArchitectureEdgar GabrielMemory consistency• Sequential consistency: the result of any execution identical as if – memory access executed by each processors were in order – memory accesses by different processors were arbitrarily interleaved• Implementation: delay completion of any memory access until all invalidations caused by that access are complete– Not possible to place data item into a write buffer and continue execution• Simple, well understood programming paradigm• Inefficient for large numbers of processors7COSC 6385 – Computer ArchitectureEdgar GabrielSynchronized memory access• All access to shared data ordered by synchronization operations– E.g. using lock/unlock operations• Positive: an update is only enforced on shared variables• Negative: handling hierarchies of lock/unlock operations requires the same order of locks/unlocks to avoid deadlockCOSC 6385 – Computer ArchitectureEdgar GabrielRelaxed consistency models• Allow read/writes to complete out-of-order• Use synchronization to enforce ordering • Specification of relaxed consistency models depends upon what read/write combinations they relax:– Relaxing W->R : total store ordering• Remains order among writes– Relaxing W->W : partial store order– Relaxing R->W and R->R: weak ordering, release consistency8COSC 6385 – Computer ArchitectureEdgar GabrielCompiler optimizations• Defining memory consistency required to specify range of legal compiler optimizations– E.g. for the previous example: exchanging the assignments of A/B and the following if-statement legal from the single-processor perspective, might effect the semantics of the program for a multi-processor environment.• Speculative execution: – Let the memory references execute out of order– If processor receives an invalidate from a different processor before committing a data item -> undo speculatively executed


View Full Document

UH COSC 6385 - Multi-Processors (II) Synchronization

Download Multi-Processors (II) Synchronization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multi-Processors (II) Synchronization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multi-Processors (II) Synchronization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?