Unformatted text preview:

Types of Synchronization Synchronization CS 740 November 14 2003 Mutual Exclusion Locks Event Synchronization Global or group based barriers Point to point Based on TCM C S Topics Locks Barriers Hardware primitives 2 Busy Waiting vs Blocking Busy waiting is preferable when scheduling overhead is larger than expected wait time processor resources are not needed for other tasks schedule based blocking is inappropriate e g in OS kernel 3 CS 740 F 03 CS 740 F 03 A Simple Lock lock unlock 4 ld cmp bnz st ret st ret register location register 0 lock location 1 location 0 CS 740 F 03 Need Atomic Primitive Test Set Swap Fetch Op Test Set based lock lock unlock Fetch Incr Fetch Decr t s bnz ret st ret register location lock location 0 Compare Swap 5 CS 740 F 03 6 T S Lock Performance Code lock delay c unlock Same total no of lock calls as p increases measure time per transfer 16 Test set c 0 Test set exponential backof f c 3 64 Test set exponential backof f c 0 Ideal 14 Time s Test and Test and Set A while lock free if test set lock free critical section else goto A 20 18 CS 740 F 03 12 10 spinning happens in cache can still generate a lot of traffic when many processors go to do test set 8 6 4 2 0 3 5 7 9 11 13 15 Number of processors 7 CS 740 F 03 8 CS 740 F 03 Test and Set with Backoff Upon failure delay for a while before retrying either constant delay or exponential backoff Tradeoffs much less network traffic exponential backoff can cause starvation for high contention locks new requestors back off for shorter times But exponential found to work best in practice 9 CS 740 F 03 Test and Set with Update Test and Set sends updates to processors that cache the lock Tradeoffs good for bus based machines still lots of traffic on distributed networks Main problem with test set based schemes is that a lock release causes all waiters to try to get the lock using a test set to try to get it 10 Ticket Lock fetch incr based Two counters next ticket number of requestors now serving number of releases that have happened Algorithm First do a fetch incr on next ticket not test set When release happens poll the value of now serving if my ticket then I win CS 740 F 03 Ticket Lock Tradeoffs guaranteed FIFO order no starvation possible latency can be low if fetch incr is cacheable traffic can be quite low but traffic is not guaranteed to be O 1 per lock acquire Use delay but how much 11 CS 740 F 03 12 CS 740 F 03 Lock Performance on SGI Challenge Loop lock delay c unlock delay d 7 7 6 6 6 5 5 5 4 3 4 3 2 2 1 1 0 0 1 3 5 7 9 11 13 15 Number of processors a Null c 0 d 0 Time s 7 Time s Time s Array based LL SC LL SC exponential Ticket Ticket proportional 4 3 2 1 0 1 3 5 7 9 11 13 15 Number of processors b Critical section c 3 64 s d 0 1 3 5 7 9 11 13 15 Number of processors c Delay c 3 64 s d 1 29 s Simple LL SC lock does best at small p due to unfairness Not so with delay between unlock and next lock Need to be careful with backoff Ticket lock with proportional backoff scales well as does array lock Methodologically challenging and need to look at real workloads 13 CS 740 F 03 Array Based Queueing Locks Every process spins on a unique location rather than on a single now serving counter fetch incr gives a process the address on which to spin Tradeoffs guarantees FIFO order like ticket lock O 1 traffic with coherence caches unlike ticket lock requires space per lock proportional to P 14 Implementing Fetch Op List Base Queueing Locks MCS All other good things O 1 traffic even without coherent caches spin locally Uses compare swap to build linked lists in software Locally allocated flag per list node to spin on Can work with fetch store but loses FIFO guarantee Tradeoffs less storage than array based locks O 1 traffic even without coherent caches compare swap not easy to implement 15 CS 740 F 03 Load Linked Store Conditional reg1 location LL location to reg1 bnz reg1 lock check if location locked sc location reg2 SC reg2 into location lock ll beqz reg2 lock if failed start again ret unlock st location 0 write 0 to location ret CS 740 F 03 16 CS 740 F 03 Barriers A Working Centralized Barrier Consecutively entering the same barrier doesn t work We will discuss five barriers Must prevent process from entering until all have left previous instance Could use another counter but increases latency and contention centralized software combining tree dissemination barrier tournament barrier MCS tree based barrier 17 Sense reversal wait for flag to take different value consecutive times CS 740 F 03 Centralized Barrier BARRIER bar name p local sense local sense toggle private sense var LOCK bar name lock mycount bar name counter mycount is private if bar name counter p UNLOCK bar name lock bar name counter 0 reset for next bar name flag local sense release waiters else UNLOCK bar name lock while bar name flag local sense 18 CS 740 F 03 Software Combining Tree Barrier Basic idea Contention Little contention notify a single shared counter when you arrive poll that shared location until all have arrived Simple implementation require polling spinning twice Flat first to ensure that all procs have left previous barrier second to ensure that all procs have arrived at current barrier Solution to get one spin sense reversal 19 CS 740 F 03 Tree structured Writes into one tree for barrier arrival Reads from another tree to allow procs to continue Sense reversal to distinguish consecutive barriers 20 CS 740 F 03 Barrier Performance Tournament Barrier Binary combining tree Representative processor at a node is statically chosen 30 Time s 25 no fetch op needed In round k proc i 2k sets a flag for proc j i 2k i then drops out of tournament and j proceeds in next round i waits for global flag signalling completion of barrier to be set could use combining wakeup tree 21 CS 740 F 03 20 15 10 5 0 12345678 Number of processors Centralized does quite well Helpful hardware support piggybacking of reads misses on bus Also for spinning on highly contended locks 22 MCS Software Barrier Modifies tournament barrier to allow static allocation in wakeup tree and to use sense reversal Every processor is a node in two P node trees Centralized Combining tree Tournament Dissemination 35 CS 740 F 03 Barrier Recommendations Criteria length of critical path number of network transactions space requirements atomic operation requirements has pointers to its parent building a fanin 4 …


View Full Document

CMU CS 15740 - Synchronization

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Loading Unlocking...
Login

Join to view Synchronization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Synchronization and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?