DOC PREVIEW
CMU CS 15740 - Lecture

This preview shows page 1-2-17-18-19-35-36 out of 36 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

SynchronizationTodd C. MowryCS 740November 24, 1998Topics• Locks• BarriersCS 740 F’98– 2 –Types of SynchronizationMutual Exclusion• LocksEvent Synchronization• Global or group-based (barriers)• Point-to-point–tightly coupled with data (full-empty bits, futures)–separate from data (flags)CS 740 F’98– 3 –Blocking vs. Busy-WaitingBusy-waiting is preferable when:• scheduling overhead is larger than expected wait time• processor resources are not needed for other tasks• schedule-based blocking is inappropriate (e.g., in OS kernel)But typical busy-waiting produces lots of networktraffic• hot-spotsCS 740 F’98– 4 –Reducing Busy-Waiting TrafficTrend was toward increasing hardware support• sophisticated atomic operations• combining networks• multistage networks with special sync variables in switches• special-purpose cache hardware to maintain queues of waitersBut (Mellor-Crummey and Scott),Appropriate software synchronization algorithms can eliminate mostbusy-waiting traffic.CS 740 F’98– 5 –Software SynchronizationHardware support required• simple fetch-and-op support• ability to “cache” shared variablesImplications• efficient software synch can be designed for large machines• special-purpose hardware has only small additional benefits–small constant factor for locks–best case factor of log P for barriersCS 740 F’98– 6 –Mutual ExclusionDiscuss four sets of primitives:• test-and-set lock• ticket lock• array-based queueing lock• list-based queueing lockCS 740 F’98– 7 –Test and Set LockCacheable vs. non-cacheableCacheable:• good if locality on lock access–reduces both latency and lock duration–particularly if lock acquired in exclusive state• not good for high-contention locksNon-cacheable:• read-modify-write at main memory• easier to implement• high latency• better for high-contention locksCS 740 F’98– 8 –Test and Test and SetA: while (lock != free)if (test&set(lock) == free) {critical section;}else goto A;(+) spinning happens in cache(-) can still generate a lot of traffic when many processors go to dotest&setCS 740 F’98– 9 –Test and Set with BackoffUpon failure, delay for a while before retrying• either constant delay or exponential backoffTradeoffs:(+) much less network traffic(-) exponential backoff can cause starvation for high-contention locks–new requestors back off for shorter timesBut exponential found to work best in practiceCS 740 F’98– 10 –Test and Set with UpdateTest and Set sends updates to processors that cachethe lockTradeoffs:(+) good for bus-based machines(-) still lots of traffic on distributed networksMain problem with test&set-based schemes is that alock release causes all waiters to try to get the lock,using a test&set to try to get it.CS 740 F’98– 11 –Ticket Lock (fetch&incr based)Ticket lock has just one proc do a test&set when a lockis released.Two counters:• next_ticket (number of requestors)• now_serving (number of releases that have happened)Algorithm:• First do a fetch&incr on next_ticket (not test&set)• When release happens, poll the value of now_serving–if my_ticket, then I winUse delay; but how much?• exponential is bad• tough to find a delay that makes network traffic constant for a singlelockCS 740 F’98– 12 –Ticket Lock Tradeoffs(+) guaranteed FIFO order; no starvation possible(+) latency can be low if fetch&incr is cacheable(+) traffic can be quite low(-) but traffic is not guaranteed to be O(1) per lockacquireCS 740 F’98– 13 –Array-Based Queueing LocksEvery process spins on a unique location, rather thanon a single no_serving counterfetch&incr gives a process the address on which tospinTradeoffs:(+) guarantees FIFO order (like ticket lock)(+) O(1) traffic with coherence caches (unlike ticket lock)(-) requires space per lock proportional to PCS 740 F’98– 14 –List-Base Queueing Locks (MCS)All other good things + O(1) traffic even withoutcoherent caches (spin locally)Uses compare&swap to build linked lists in softwareLocally-allocated flag per list node to spin onCan work with fetch&store, but loses FIFO guaranteeTradeoffs:(+) less storage than array-based locks(+) O(1) traffic even without coherent caches(-) compare&swap not easy to implementCS 740 F’98– 15 –Lock Performance(See attached sheets.)CS 740 F’98– 16 –Implementing Fetch&Op PrimitivesOne possibility for implementation in caches: LL/SC• “Load Linked” (LL) loads the lock and sets a bit• When “atomic” operation is finished, “Store Conditional” (SC)succeeds only if bit was not reset in interim• Fits within the “load/store” (aka RISC) architecture paradigm–I.e. no instruction performs more than one memory operation–good for pipelining and clock ratesGood for bus-based machines: SC result known on busMore complex for directory-based machines:• wait for SC to go to directory and get ownership (long latency)• have LL load in exclusive mode, so SC succeeds immediately if stillin exclusive modeCS 740 F’98– 17 –Bottom Line for LocksLots of optionsUsing simple hardware primitives, we can build verysuccessful algorithms in software• LL/SC works well if there is locality of synch accesses• Otherwise, in-memory fetch&ops are good for high contentionCS 740 F’98– 18 –Lock RecommendationsCriteria:• scalability• one-processor latency• space requirements• fairness• atomic operation requirementsCS 740 F’98– 19 –ScalabilityMCS locks most scalable in general• array-based queueing locks also good with coherent cachesTest&set and Ticket locks scale well with backoff• but add network trafficCS 740 F’98– 20 –Single-Processor LatencyTest&set and Ticket locks are bestMCS and Array-based queueing locks also good whengood fetch&op primitives are available• array-based bad in terms of space if too many locks needed– in OS kernel, for example–when more processes than processorsCS 740 F’98– 21 –FairnessTicket lock, array-based lock and MCS all guaranteeFIFO• MCS needs compare&swapFairness can waste CPU resources in case ofpreemption• can be fixed by coschedulingCS 740 F’98– 22 –Primitives Required:Test&Set:• test&setTicket:• fetch&incrMCS:• fetch&store• compare&swapQueue-based:• fetch&storeCS 740 F’98– 23 –Lock RecommendationsFor scalability and minimizing network traffic:• MCS–one-proc latency good if efficient fetch&op providedIf one-proc


View Full Document

CMU CS 15740 - Lecture

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?