DOC PREVIEW
CMU CS 15740 - Synchronization

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Synchronization Todd C. Mowry CS 740 November 1, 2000Types of SynchronizationBusy Waiting vs. BlockingA Simple LockNeed Atomic Primitive!Test&Set based lockT&S Lock PerformanceTest and Test and SetTest and Set with BackoffTest and Set with UpdateTicket Lock (fetch&incr based)Ticket Lock TradeoffsArray-Based Queueing LocksList-Base Queueing Locks (MCS)Implementing Fetch&OpBarriersCentralized BarrierSoftware Combining Tree BarrierDissemination BarrierTournament BarrierMCS Software BarrierBarrier RecommendationsSpace RequirementsNetwork TransactionsCritical Path LengthPrimitives NeededSlide 27SynchronizationTodd C. MowryCS 740November 1, 2000Topics•Locks•Barriers•Hardware primitivesCS 740 F’00– 2 –Types of SynchronizationMutual Exclusion•LocksEvent Synchronization•Global or group-based (barriers)•Point-to-pointCS 740 F’00– 3 –Busy Waiting vs. BlockingBusy-waiting is preferable when:•scheduling overhead is larger than expected wait time•processor resources are not needed for other tasks•schedule-based blocking is inappropriate (e.g., in OS kernel)CS 740 F’00– 4 –A Simple Locklock: ld register, locationcmp register, #0bnz lockst location, #1retunlock: st location, #0retCS 740 F’00– 5 –Need Atomic Primitive!Test&SetSwapFetch&Op•Fetch&Incr, Fetch&DecrCompare&SwapCS 740 F’00– 6 –Test&Set based locklock: t&s register, location bnz lockretunlock: st location, #0retCS 740 F’00– 7 –T&S Lock PerformanceCode: lock; delay(c); unlock;Same total no. of lock calls as p increases; measure time per transfer  Number of processorsTime (s)11 13 1502468101214161820Test&set, c = 0Test&set, exponential backoff, c = 3.64Test&set, exponential backoff, c = 0Ideal9753CS 740 F’00– 8 –Test and Test and SetA: while (lock != free)if (test&set(lock) == free) {critical section;}else goto A;(+) spinning happens in cache(-) can still generate a lot of traffic when many processors go to do test&setCS 740 F’00– 9 –Test and Set with BackofUpon failure, delay for a while before retrying•either constant delay or exponential backofTradeofs:(+) much less network traffic(-) exponential backof can cause starvation for high-contention locks–new requestors back off for shorter timesBut exponential found to work best in practiceCS 740 F’00– 10 –Test and Set with UpdateTest and Set sends updates to processors that cache the lockTradeofs:(+) good for bus-based machines(-) still lots of traffic on distributed networksMain problem with test&set-based schemes is that a lock release causes all waiters to try to get the lock, using a test&set to try to get it.CS 740 F’00– 11 –Ticket Lock (fetch&incr based)Two counters:•next_ticket (number of requestors)•now_serving (number of releases that have happened)Algorithm:•First do a fetch&incr on next_ticket (not test&set)•When release happens, poll the value of now_serving–if my_ticket, then I winUse delay; but how much?CS 740 F’00– 12 –Ticket Lock Tradeofs(+) guaranteed FIFO order; no starvation possible(+) latency can be low if fetch&incr is cacheable(+) traffic can be quite low(-) but traffic is not guaranteed to be O(1) per lock acquireCS 740 F’00– 13 –Array-Based Queueing LocksEvery process spins on a unique location, rather than on a single now_serving counterfetch&incr gives a process the address on which to spinTradeofs:(+) guarantees FIFO order (like ticket lock)(+) O(1) traffic with coherence caches (unlike ticket lock)(-) requires space per lock proportional to PCS 740 F’00– 14 –List-Base Queueing Locks (MCS)All other good things + O(1) traffic even without coherent caches (spin locally)Uses compare&swap to build linked lists in softwareLocally-allocated flag per list node to spin onCan work with fetch&store, but loses FIFO guaranteeTradeofs:(+) less storage than array-based locks(+) O(1) traffic even without coherent caches(-) compare&swap not easy to implementCS 740 F’00– 15 –Implementing Fetch&OpLoad Linked/Store Conditionallock: ll reg1, location /* LL location to reg1 */bnz reg1, lock /* check if location locked*/sc location, reg2 /* SC reg2 into location*/beqz reg2, lock /* if failed, start again */retunlock:st location, #0 /* write 0 to location */retCS 740 F’00– 16 –BarriersWe will discuss five barriers:•centralized•software combining tree•dissemination barrier•tournament barrier•MCS tree-based barrierCS 740 F’00– 17 –Centralized BarrierBasic idea:•notify a single shared counter when you arrive•poll that shared location until all have arrivedSimple implementation require polling/spinning twice:•first to ensure that all procs have left previous barrier•second to ensure that all procs have arrived at current barrierSolution to get one spin: sense reversalCS 740 F’00– 18 –Software Combining Tree BarrierWrites into one tree for barrier arrivalReads from another tree to allow procs to continueSense reversal to distinguish consecutive barriersFlat Tree structuredContention Little contentionCS 740 F’00– 19 –Dissemination Barrierlog P rounds of synchronizationIn round k, proc i synchronizes with proc (i+2k) mod PAdvantage:•Can statically allocate flags to avoid remote spinningCS 740 F’00– 20 –Tournament BarrierBinary combining treeRepresentative processor at a node is statically chosen•no fetch&op neededIn round k, proc i=2k sets a flag for proc j=i-2k•i then drops out of tournament and j proceeds in next round•i waits for global flag signalling completion of barrier to be set–could use combining wakeup treeCS 740 F’00– 21 –MCS Software BarrierModifies tournament barrier to allow static allocation in wakeup tree, and to use sense reversalEvery processor is a node in two P-node trees:•has pointers to its parent building a fanin-4 arrival tree•has pointers to its children to build a fanout-2 wakeup treeCS 740 F’00– 22


View Full Document

CMU CS 15740 - Synchronization

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Synchronization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Synchronization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Synchronization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?