6.852 Lecture 15●Pragmatic issues for shared-memory multiprocessors●Practical mutual exclusion algorithms–test-and-set locks–queue locks●Generalized exclusion/resource allocation problems●Reading:–Mellor-Crummey and Scott paper (Dijkstra prize winner)–Magnussen, Landin, Hagersten paper–Chapter 11Next time●Consensus●Reading: Chapter 12Mutual exclusion with RMW●Quick review–shared-memory multiprocessors provide “atomic operations”●test&set, fetch&increment, swap, compare&swap (CAS), LL/SC–in practice, all mutual exclusion algorithms use these operations●one-variable test&set algorithm●queue lock: one queue with enqueue, dequeue and head–multiprocessors do not support queues in hardware●ticket lock algorithm: two fetch&inc variablesA note on terminology●Different usage in “systems” and “theory” communities–blocking: yields processor–atomic operation: some kind of read-modify-write operation–implement: provide specified functionality??–simulation: experiment, or running on a (hardware) simulator–process vs thread–locks vs mutual exclusion●Different emphasis and concerns–mechanism vs. abstraction: processors, locks, blocking–performance issues: caching, contention, etc.Mutual exclusion in practice●What to do when lock is taken–“block”: deschedule process (yield processor)●OS reschedules it in future, often when some condition is satisfied–busy-wait/spin●don't yield process: repeatedly test for some condition●should be used only if waiting is expected to be very shortThe choice of blocking vs spinning applies to other synchronization constructs, such as producer-consumer and barriers.Mutual exclusion in practice●Spin locks are very important–used in OS kernels●Assume critical sections are very short–typically not nested (hold only one lock at a time)●Performance is critical–must consider caching and contention effects–adaptive requirements/performanceShared-memory multiprocessorsShared memoryP1P2P3P4P5Shared-memory multiprocessorsShared memoryP1P2P3P4P5$ $ $ $ $Network (bus)Mem Mem Mem MemShared-memory multiprocessorsP1P2P3P4P5$ $ $ $ $Mem Mem Mem Mem MemNetwork (bus)Shared-memory multiprocessorsP1P2P3P4P5$ $ $ $ $Mem Mem Mem Mem MemNetwork (bus)C11Network (bus)C12C13C14Shared-memory multiprocessors●Memory access does not have uniform cost–next-level cache access is ~10x more expensive–remote-memory access produces network traffic●network bandwidth can be bottleneck–writes invalidate caches●every processor that wants to read must request again●can typically share read access–all memory is multiwriter, but most is reserved for a processMutual exclusion in practice●Critical sections are very short–typically hold only one lock at a time–critical processes are not swapped out●assume no multiprogramming for now (one thread per processor) ●Caching and contention are importantPractical spin locks●Test&set locks●Ticket lock●Queue locks–Anderson–Graunke/Thakkar–Mellor-Crummey/Scott (MCS)–Craig-Landin-Hagersten (CLH)●Adding other features–timeout–hierarchical locks–reader-writer locksSimple test&set lock●Simple●Low space cost●But lots of network traffic if highly contendedtryi waitfor(test&set(lock) = 0)critiexiti lock := 0remilock: {0,1}; initially 0many processes waiting for lock to become freeSimple test&set lockP1P2P3P4P51 - - - -Network (bus)Mem Mem Mem Memt&sSimple test&set lockP1P2P3P4P51 - - - -Network (bus)Mem Mem Mem MemreqXSimple test&set lockP1P2P3P4P5- 1 - - -Network (bus)Mem Mem Mem MemSimple test&set lockP1P2P3P4P5- 1 - - -Network (bus)Mem Mem Mem Memt&s t&s t&sSimple test&set lockP1P2P3P4P5- 1 - - -Network (bus)Mem Mem Mem MemreqX reqX reqXSimple test&set lockP1P2P3P4P5- - 1 - -Network (bus)Mem Mem Mem MemreqX reqXSimple test&set lockP1P2P3P4P5- - - 1 -Network (bus)Mem Mem Mem MemreqXSimple test&set lockP1P2P3P4P5- - - 1 -Network (bus)Mem Mem Mem MemreqXt&s t&s t&sSimple test&set lockP1P2P3P4P5- - - - 1Network (bus)Mem Mem Mem Memt&s t&sSimple test&set lockP1P2P3P4P5- 1 - - -Network (bus)Mem Mem Mem MemSimple test&set lockP1P2P3P4P5- 1 - - -Network (bus)Mem Mem Mem Memw(0)Simple test&set lockP1P2P3P4P5- 1 - - -Network (bus)Mem Mem Mem MemreqXSimple test&set lockP1P2P3P4P50 - - - -Network (bus)Mem Mem Mem MemSimple test&set lockP1P2P3P4P5$ $ $ $ $Mem Mem Mem Mem MemNetwork (bus)Test-and-test&set lock●dealing with high contention–test-and-test&set●read before attempting test&set●reduces network traffic (but it's still high!)Test-and-test&set lockP1P2P3P4P51 1 1 1 1Network (bus)Mem Mem Mem MemTest-and-test&set lockP1P2P3P4P51 1 1 1 1Network (bus)Mem Mem Mem Memw(0)Test-and-test&set lockP1P2P3P4P50 - - - -Network (bus)Mem Mem Mem MemTest-and-test&set lockP1P2P3P4P50 - - - -Network (bus)Mem Mem Mem Memreadreadread readTest-and-test&set lockP1P2P3P4P50 0 0 0 0Network (bus)Mem Mem Mem Memt&s t&s t&s t&sSimple test&set lock with backoff●dealing with high contention–test-and-test&set●read before attempting test&set●reduces network traffic (but it's still high!)–test&set with backoff●if test&set “fails” (returns 1), wait before trying again–reduces network traffic (both read and write)●exponential backoff seems to work best●obviates need for test-and-test&setTicket locktryi ticket := f&i(next) waitfor(granted = ticket)critiexiti f&i(granted)reminext: integer; initially 0granted: integer; initially 0●simple, low space cost, no bypass●network traffic similar to test-and-test&set (why?)–not quite as bad though●can use backoff: but delay potentially more costly–proportional backoff seems best●delay depends on difference between ticket and grantedArray-based queue locks●Each process spins on a different location–reduces invalidation traffic●each entry in array must be in separate cache line–high space cost: one location (cache line) per lock per process●not adaptiveAnderson locktryi my_slot := f&i(next_slot) waitfor(slots[my_slot] = front)critiexiti slots[my_slot] := not_front slots[my_slot+1] := frontremislots: array[0..N-1] of { front, not_front }; initially (front, not_front, not_front,..., not_front)next_slot: integer; initially 0●entries either “front” or “not-front” (of queue)–exactly one “front” (except for short interval in exit region)●tail of queue indicated by next_slot–queue is empty if next_slot contains frontAnderson locktryi my_slot := f&i(next_slot) if
View Full Document