Unformatted text preview:

6.852 Lecture 15●Pragmatic issues for shared-memory multiprocessors●Practical mutual exclusion algorithms–test-and-set locks–queue locks●Generalized exclusion/resource allocation problems●Reading:–Mellor-Crummey and Scott paper (Dijkstra prize winner)–Magnussen, Landin, Hagersten paper–Chapter 11Next time●Consensus●Reading: Chapter 12Mutual exclusion with RMW●Quick review–shared-memory multiprocessors provide “atomic operations”●test&set, fetch&increment, swap, compare&swap (CAS), LL/SC–in practice, all mutual exclusion algorithms use these operations●one-variable test&set algorithm●queue lock: one queue with enqueue, dequeue and head–multiprocessors do not support queues in hardware●ticket lock algorithm: two fetch&inc variablesA note on terminology●Different usage in “systems” and “theory” communities–blocking: yields processor–atomic operation: some kind of read-modify-write operation–implement: provide specified functionality??–simulation: experiment, or running on a (hardware) simulator–process vs thread–locks vs mutual exclusion●Different emphasis and concerns–mechanism vs. abstraction: processors, locks, blocking–performance issues: caching, contention, etc.Mutual exclusion in practice●What to do when lock is taken–“block”: deschedule process (yield processor)●OS reschedules it in future, often when some condition is satisfied–busy-wait/spin●don't yield process: repeatedly test for some condition●should be used only if waiting is expected to be very shortThe choice of blocking vs spinning applies to other synchronization constructs, such as producer-consumer and barriers.Mutual exclusion in practice●Spin locks are very important–used in OS kernels●Assume critical sections are very short–typically not nested (hold only one lock at a time)●Performance is critical–must consider caching and contention effects–adaptive requirements/performanceShared-memory multiprocessorsShared memoryP1P2P3P4P5Shared-memory multiprocessorsShared memoryP1P2P3P4P5$ $ $ $ $Network (bus)Mem Mem Mem MemShared-memory multiprocessorsP1P2P3P4P5$ $ $ $ $Mem Mem Mem Mem MemNetwork (bus)Shared-memory multiprocessorsP1P2P3P4P5$ $ $ $ $Mem Mem Mem Mem MemNetwork (bus)C11Network (bus)C12C13C14Shared-memory multiprocessors●Memory access does not have uniform cost–next-level cache access is ~10x more expensive–remote-memory access produces network traffic●network bandwidth can be bottleneck–writes invalidate caches●every processor that wants to read must request again●can typically share read access–all memory is multiwriter, but most is reserved for a processMutual exclusion in practice●Critical sections are very short–typically hold only one lock at a time–critical processes are not swapped out●assume no multiprogramming for now (one thread per processor) ●Caching and contention are importantPractical spin locks●Test&set locks●Ticket lock●Queue locks–Anderson–Graunke/Thakkar–Mellor-Crummey/Scott (MCS)–Craig-Landin-Hagersten (CLH)●Adding other features–timeout–hierarchical locks–reader-writer locksSimple test&set lock●Simple●Low space cost●But lots of network traffic if highly contendedtryi waitfor(test&set(lock) = 0)critiexiti lock := 0remilock: {0,1}; initially 0many processes waiting for lock to become freeSimple test&set lockP1P2P3P4P51 - - - -Network (bus)Mem Mem Mem Memt&sSimple test&set lockP1P2P3P4P51 - - - -Network (bus)Mem Mem Mem MemreqXSimple test&set lockP1P2P3P4P5- 1 - - -Network (bus)Mem Mem Mem MemSimple test&set lockP1P2P3P4P5- 1 - - -Network (bus)Mem Mem Mem Memt&s t&s t&sSimple test&set lockP1P2P3P4P5- 1 - - -Network (bus)Mem Mem Mem MemreqX reqX reqXSimple test&set lockP1P2P3P4P5- - 1 - -Network (bus)Mem Mem Mem MemreqX reqXSimple test&set lockP1P2P3P4P5- - - 1 -Network (bus)Mem Mem Mem MemreqXSimple test&set lockP1P2P3P4P5- - - 1 -Network (bus)Mem Mem Mem MemreqXt&s t&s t&sSimple test&set lockP1P2P3P4P5- - - - 1Network (bus)Mem Mem Mem Memt&s t&sSimple test&set lockP1P2P3P4P5- 1 - - -Network (bus)Mem Mem Mem MemSimple test&set lockP1P2P3P4P5- 1 - - -Network (bus)Mem Mem Mem Memw(0)Simple test&set lockP1P2P3P4P5- 1 - - -Network (bus)Mem Mem Mem MemreqXSimple test&set lockP1P2P3P4P50 - - - -Network (bus)Mem Mem Mem MemSimple test&set lockP1P2P3P4P5$ $ $ $ $Mem Mem Mem Mem MemNetwork (bus)Test-and-test&set lock●dealing with high contention–test-and-test&set●read before attempting test&set●reduces network traffic (but it's still high!)Test-and-test&set lockP1P2P3P4P51 1 1 1 1Network (bus)Mem Mem Mem MemTest-and-test&set lockP1P2P3P4P51 1 1 1 1Network (bus)Mem Mem Mem Memw(0)Test-and-test&set lockP1P2P3P4P50 - - - -Network (bus)Mem Mem Mem MemTest-and-test&set lockP1P2P3P4P50 - - - -Network (bus)Mem Mem Mem Memreadreadread readTest-and-test&set lockP1P2P3P4P50 0 0 0 0Network (bus)Mem Mem Mem Memt&s t&s t&s t&sSimple test&set lock with backoff●dealing with high contention–test-and-test&set●read before attempting test&set●reduces network traffic (but it's still high!)–test&set with backoff●if test&set “fails” (returns 1), wait before trying again–reduces network traffic (both read and write)●exponential backoff seems to work best●obviates need for test-and-test&setTicket locktryi ticket := f&i(next) waitfor(granted = ticket)critiexiti f&i(granted)reminext: integer; initially 0granted: integer; initially 0●simple, low space cost, no bypass●network traffic similar to test-and-test&set (why?)–not quite as bad though●can use backoff: but delay potentially more costly–proportional backoff seems best●delay depends on difference between ticket and grantedArray-based queue locks●Each process spins on a different location–reduces invalidation traffic●each entry in array must be in separate cache line–high space cost: one location (cache line) per lock per process●not adaptiveAnderson locktryi my_slot := f&i(next_slot) waitfor(slots[my_slot] = front)critiexiti slots[my_slot] := not_front slots[my_slot+1] := frontremislots: array[0..N-1] of { front, not_front }; initially (front, not_front, not_front,..., not_front)next_slot: integer; initially 0●entries either “front” or “not-front” (of queue)–exactly one “front” (except for short interval in exit region)●tail of queue indicated by next_slot–queue is empty if next_slot contains frontAnderson locktryi my_slot := f&i(next_slot) if


View Full Document

MIT 6 852 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?