Synchronization Todd C Mowry CS 740 November 24 1998 Topics Locks Barriers Types of Synchronization Mutual Exclusion Locks Event Synchronization Global or group based barriers Point to point tightly coupled with data full empty bits futures separate from data flags 2 CS 740 F 98 Blocking vs Busy Waiting Busy waiting is preferable when scheduling overhead is larger than expected wait time processor resources are not needed for other tasks schedule based blocking is inappropriate e g in OS kernel But typical busy waiting produces lots of network traffic hot spots 3 CS 740 F 98 Reducing Busy Waiting Traffic Trend was toward increasing hardware support sophisticated atomic operations combining networks multistage networks with special sync variables in switches special purpose cache hardware to maintain queues of waiters But Mellor Crummey and Scott Appropriate software synchronization algorithms can eliminate most busy waiting traffic 4 CS 740 F 98 Software Synchronization Hardware support required simple fetch and op support ability to cache shared variables Implications efficient software synch can be designed for large machines special purpose hardware has only small additional benefits small constant factor for locks best case factor of log P for barriers 5 CS 740 F 98 Mutual Exclusion Discuss four sets of primitives test and set lock ticket lock array based queueing lock list based queueing lock 6 CS 740 F 98 Test and Set Lock Cacheable vs non cacheable Cacheable good if locality on lock access reduces both latency and lock duration particularly if lock acquired in exclusive state not good for high contention locks Non cacheable read modify write at main memory easier to implement high latency better for high contention locks 7 CS 740 F 98 Test and Test and Set A while lock free if test set lock free critical section else goto A spinning happens in cache can still generate a lot of traffic when many processors go to do test set 8 CS 740 F 98 Test and Set with Backoff Upon failure delay for a while before retrying either constant delay or exponential backoff Tradeoffs much less network traffic exponential backoff can cause starvation for high contention locks new requestors back off for shorter times But exponential found to work best in practice 9 CS 740 F 98 Test and Set with Update Test and Set sends updates to processors that cache the lock Tradeoffs good for bus based machines still lots of traffic on distributed networks Main problem with test set based schemes is that a lock release causes all waiters to try to get the lock using a test set to try to get it 10 CS 740 F 98 Ticket Lock fetch incr based Ticket lock has just one proc do a test set when a lock is released Two counters next ticket number of requestors now serving number of releases that have happened Algorithm First do a fetch incr on next ticket not test set When release happens poll the value of now serving if my ticket then I win Use delay but how much exponential is bad tough to find a delay that makes network traffic constant for a single lock 11 CS 740 F 98 Ticket Lock Tradeoffs guaranteed FIFO order no starvation possible latency can be low if fetch incr is cacheable traffic can be quite low but traffic is not guaranteed to be O 1 per lock acquire 12 CS 740 F 98 Array Based Queueing Locks Every process spins on a unique location rather than on a single no serving counter fetch incr gives a process the address on which to spin Tradeoffs guarantees FIFO order like ticket lock O 1 traffic with coherence caches unlike ticket lock requires space per lock proportional to P 13 CS 740 F 98 List Base Queueing Locks MCS All other good things O 1 traffic even without coherent caches spin locally Uses compare swap to build linked lists in software Locally allocated flag per list node to spin on Can work with fetch store but loses FIFO guarantee Tradeoffs less storage than array based locks O 1 traffic even without coherent caches compare swap not easy to implement 14 CS 740 F 98 Lock Performance See attached sheets 15 CS 740 F 98 Implementing Fetch Op Primitives One possibility for implementation in caches LL SC Load Linked LL loads the lock and sets a bit When atomic operation is finished Store Conditional SC succeeds only if bit was not reset in interim Fits within the load store aka RISC architecture paradigm I e no instruction performs more than one memory operation good for pipelining and clock rates Good for bus based machines SC result known on bus More complex for directory based machines wait for SC to go to directory and get ownership long latency have LL load in exclusive mode so SC succeeds immediately if still in exclusive mode 16 CS 740 F 98 Bottom Line for Locks Lots of options Using simple hardware primitives we can build very successful algorithms in software LL SC works well if there is locality of synch accesses Otherwise in memory fetch ops are good for high contention 17 CS 740 F 98 Lock Recommendations Criteria scalability one processor latency space requirements fairness atomic operation requirements 18 CS 740 F 98 Scalability MCS locks most scalable in general array based queueing locks also good with coherent caches Test set and Ticket locks scale well with backoff but add network traffic 19 CS 740 F 98 Single Processor Latency Test set and Ticket locks are best MCS and Array based queueing locks also good when good fetch op primitives are available array based bad in terms of space if too many locks needed in OS kernel for example when more processes than processors 20 CS 740 F 98 Fairness Ticket lock array based lock and MCS all guarantee FIFO MCS needs compare swap Fairness can waste CPU resources in case of preemption can be fixed by coscheduling 21 CS 740 F 98 Primitives Required Test Set test set Ticket fetch incr MCS fetch store compare swap Queue based fetch store 22 CS 740 F 98 Lock Recommendations For scalability and minimizing network traffic MCS one proc latency good if efficient fetch op provided If one proc latency is critical or fetch op not available Ticket with proportional backoff If preemption possible while spinning or if neither fetch store nor fetch incr provided Test Set with exponential backoff 23 CS 740 F 98 Barriers We will discuss five barriers centralized software combining tree dissemination barrier tournament barrier MCS tree based barrier 24 CS 740 F 98 Centralized Barrier Basic idea notify a single shared counter when you arrive poll that shared location
View Full Document
Unlocking...