Synchronization Todd C Mowry CS 740 November 1 2000 Topics Locks Barriers Hardware primitives Types of Synchronization Mutual Exclusion Locks Event Synchronization Global or group based barriers Point to point 2 CS 740 F 00 Busy Waiting vs Blocking Busy waiting is preferable when scheduling overhead is larger than expected wait time processor resources are not needed for other tasks schedule based blocking is inappropriate e g in OS kernel 3 CS 740 F 00 lock unlock 4 ld cmp bnz st ret st ret A Simple Lock register location register 0 lock location 1 location 0 CS 740 F 00 Need Atomic Primitive Test Set Swap Fetch Op Fetch Incr Fetch Decr Compare Swap 5 CS 740 F 00 lock unlock 6 Test Set based lock t s bnz ret st ret register location lock location 0 CS 740 F 00 T S Lock Performance lock delay c unlock Code Same total no of lock calls as p increases measure time per transfer 20 18 16 14 Time s Test set c 0 Test set exponential backof f c 3 64 Test set exponential backof f c 0 Ideal 12 10 8 6 4 2 0 3 5 7 9 11 13 15 Number of processors 7 CS 740 F 00 Test and Test and Set A while lock free if test set lock free critical section else goto A spinning happens in cache can still generate a lot of traffic when many processors go to do test set 8 CS 740 F 00 Test and Set with Backof Upon failure delay for a while before retrying either constant delay or exponential backof Tradeofs much less network traffic exponential backof can cause starvation for high contention locks new requestors back off for shorter times But exponential found to work best in practice 9 CS 740 F 00 Test and Set with Update Test and Set sends updates to processors that cache the lock Tradeofs good for bus based machines still lots of traffic on distributed networks Main problem with test set based schemes is that a lock release causes all waiters to try to get the lock using a test set to try to get it 10 CS 740 F 00 Ticket Lock fetch incr based Two counters next ticket number of requestors now serving number of releases that have happened Algorithm First do a fetch incr on next ticket not test set When release happens poll the value of now serving if my ticket then I win Use delay but how much 11 CS 740 F 00 Ticket Lock Tradeofs guaranteed FIFO order no starvation possible latency can be low if fetch incr is cacheable traffic can be quite low but traffic is not guaranteed to be O 1 per lock acquire 12 CS 740 F 00 Array Based Queueing Locks Every process spins on a unique location rather than on a single now serving counter fetch incr gives a process the address on which to spin Tradeofs guarantees FIFO order like ticket lock O 1 traffic with coherence caches unlike ticket lock requires space per lock proportional to P 13 CS 740 F 00 List Base Queueing Locks MCS All other good things O 1 traffic even without coherent caches spin locally Uses compare swap to build linked lists in software Locally allocated flag per list node to spin on Can work with fetch store but loses FIFO guarantee Tradeofs less storage than array based locks O 1 traffic even without coherent caches compare swap not easy to implement 14 CS 740 F 00 Implementing Fetch Op Load Linked Store Conditional lock ll reg1 location LL location to reg1 bnz locked reg1 lock check if location location reg2 SC reg2 into location sc beqz reg2 lock if failed start again ret unlock st location 0 write 0 to location ret 15 CS 740 F 00 Barrier s We will discuss five barriers centralized software combining tree dissemination barrier tournament barrier MCS tree based barrier 16 CS 740 F 00 Basic idea Centralized Barrier notify a single shared counter when you arrive poll that shared location until all have arrived Simple implementation require polling spinning twice first to ensure that all procs have left previous barrier second to ensure that all procs have arrived at current barrier Solution to get one spin sense reversal 17 CS 740 F 00 Software Combining Tree Barrier Contention Flat Little contention Tree structured Writes into one tree for barrier arrival Reads from another tree to allow procs to continue Sense reversal to distinguish consecutive barriers 18 CS 740 F 00 Dissemination Barrier log P rounds of synchronization In round k proc i synchronizes with proc i 2k mod P Advantage Can statically allocate flags to avoid remote spinning 19 CS 740 F 00 Tournament Barrier Binary combining tree Representative processor at a node is statically chosen no fetch op needed In round k proc i 2k sets a flag for proc j i 2k i then drops out of tournament and j proceeds in next round i waits for global flag signalling completion of barrier to be set could use combining wakeup tree 20 CS 740 F 00 MCS Software Barrier Modifies tournament barrier to allow static allocation in wakeup tree and to use sense reversal Every processor is a node in two Pnode trees has pointers to its parent building a fanin4 arrival tree has pointers to its children to build a fanout 2 wakeup tree 21 CS 740 F 00 Barrier Recommendations Criteria length of critical path number of network transactions space requirements atomic operation requirements 22 CS 740 F 00 Space Requirements Centralized constant MCS combining tree O P Dissemination Tournament O PlogP 23 CS 740 F 00 Network Transactions Centralized combining tree O P if broadcast and coherent caches unbounded otherwise Dissemination O PlogP Tournament MCS O P 24 CS 740 F 00 Critical Path Length If independent parallel network paths available all are O logP except centralized which is O P Otherwise e g shared bus linear factors dominate 25 CS 740 F 00 Primitives Needed Centralized and combining tree atomic increment atomic decrement Others atomic read atomic write 26 CS 740 F 00 Barrier Recommendations Without broadcast on distributed memory Dissemination MCS is good only critical path length is about 1 5X longer MCS has somewhat better network load and space requirements Cache coherence with broadcast e g a bus MCS with flag wakeup centralized is best for modest numbers of processors Big advantage of centralized barrier adapts to changing number of processors across barrier calls 27 CS 740 F 00
View Full Document
Unlocking...