AdministriviaReaders-Writers ProblemImplementing shared locksshared locks (continued)Review: Test-and-set spinlockRelaxed consistency modelCache coherencecc-NUMANUMA and spinlocksRecall producer/consumer (lecture 3)Eliminating locksLock-free producer/consumerNon-blocking synchronizationExample: stackBenign racesRead-copy update href {http://www.rdrop.com/users/paulmck/RCU/rclockjrnl_tpds_mathtype.pdf}{[McKenney]}Garbage collectionMCS lockMCS AcquireMCS AcquireMCS AcquireMCS AcquireMCS Release w. CASMCS Release w. CASMCS Release w. CASMCS Release w/o CASMCS Release w/o C&S codeKernel support for synchronizationRace conditionThe deadlock problemMore deadlocksDeadlocks w/o computersDeadlock conditionsPrevent by eliminating one conditionResource-allocation graphExample resource allocation graphGraph with deadlockIs this deadlock?Cycles and deadlockPreventionClaim edgesExample: unsafe stateDetecting deadlockFixing & debugging deadlocksTransactionsDetecting data racesAdministrivia• Project 1 due right now- Free extension if you are here- Of for SCPD students who watch lecture before midnight• To get extension- Put “” at top of your design document•Project 2 section tomorrow 3:15pm here• I will be out of town Monday- Will hold office hours Wednesday, 2:30pm instead1/41Readers-Writers Problem• Multiple threads may access data- Readers – will only observe, not modify data- Writers – will change the data• Goal: allow multiple readers or one single writer- Thus, lock can be shared amongst concurrent readers• Can implement with other primitives- Keep integer i – # or readers or -1 if held by writer- Protect i with mutex- Sleep on condition variable when can’t get lock2/41Implementing shared locksstruct sharedlk {int i;mutex_t m;cond_t c;};void AcquireExclusive (sharedlk *sl) {lock (sl->m);while (sl->i) { wait (sl->m, sl->c); }sl->i = -1;unlock (sl->m);}void AcquireShared (sharedlk *sl) {lock (sl->m);while (sl->i < 0) { wait (sl->m, sl->c); }sl->i++;unlock (sl->m);}3/41shared locks (continued)void ReleaseShared (sharedlk *sl) {lock (sl->m);if (!--sl->i) signal (sl->c);unlock (sl->m);}void ReleaseExclusive (sharedlk *sl) {lock (sl->m);sl->i = 0;broadcast (sl->c);unlock (sl->m);}• Note: Must deal with starvation4/41Review: Test-and-set spinlockstruct var {int lock;int val;};void atomic_inc (var *v) {while (test_and_set (&v->lock));v->val++;v->lock = 0;}void atomic_dec (var *v) {while (test_and_set (&v->lock));v->val--;v->lock = 0;}5/41Relaxed consistency model• Suppose no sequential consistency• Recall alpha mb (mem. barrier)—what if we omit it?- Hardware could violate program orderPROGRAM ORDER VIEW ON OTHER CPUread/write: v->lock = 1; v->lock = 1;read: v->val;write: v->val = read val + 1;write: v->lock = 0; v->lock = 0;/* danger */v->val = read_val + 1;• If atomic dec called where danger, bad val results• mb in testand set preserves program order- All ops before mb in program order appear before on all CPUs- All ops after mb in program order appear after on all CPUs• Many example in this lecture assume S.C.- Need to add barrier instructions on non-S.C. hardware6/41Cache coherence• Performance requires caches• Sequential consistency requires cache coherence• Barrier & atomic ops require cache coherence• Bus-based approaches- “Snoopy” protocols, each CPU listens to memory bus- Use write through and invalidate when you see a write- Or have ownership scheme (e.g., Pentium MESI bits)- Bus-based schemes limit scalability• Cache-Only Memory Architecture (COMA)- Each CPU has local RAM, treated as cache- Cache lines migrate around based on access- Data lives only in cache7/41cc-NUMA• Previous slide had dance hall architectures- Any CPU can “dance with” any memory equally• An alternative: Non-Uniform Memory Access- Each CPU has fast access to some “close” memory- Slower to access memory that is farther away- Use a directory to k eep track of who is caching what• Originally for machines with many CPUs- Now AMD Opterons are kind of like this- Next generation Intel CPUs will be like this, too• cc-NUMA = cache-coherent NUMA- Can also have non-cache-coherent NUMA, though uncommon- BBN Butterfly 1 has no cache at all- Cray T3D has local/global memory8/41NUMA and spinlocks• Test-and-set spinlock has several advantages- Simple to implement and understand- One memory location for arbitrarily many CPUs• But also has disadvantages- Lots of traffic over memory bus (especially when > 1 spinner)- Not necessarily fair (same CPU acquires lock many times)- Even less fair on a NUMA machine- Allegedly Google had fairness problems even on Opterons• Idea 1: Avoid spinlocks altogether• Idea 2: Reduce bus traffic with better spinlocks- Design lock that spins only on local memory- Also gives better fairness9/41Recall producer/consumer (lecture 3)/* PRODUCER */for (;;) {/* produce item, putin nextProduced */mutex_lock (&mutex);while (count == BUF_SIZE)cond_wait (&nonfull,&mutex);buffer [in] = nextProduced;in = (in + 1) % BUF_SIZE;count++;cond_signal (&nonempty);mutex_unlock (&mutex);}/* CONSUMER */for (;;) {mutex_lock (&mutex);while (count == 0)cond_wait (&nonempty,&mutex);nextConsumed = buffer[out];out = (out + 1) % BUF_SIZE;count--;cond_signal (&nonfull);mutex_unlock (&mutex);/* use item innextConsumed */}10/41Eliminating locks• One use of locks is to coordinate multiple updates ofsingle piece of state• How to remove locks here?- Factor state so each variable only has a single writer• Producer/consumer example revisited- Assume for example you have sequential consistency- Assume one producer, one consumer-Why do we need count variable, written by both?To detect buffer full/empty- Have producer write in, consumer write out- Use in/out to detect buffer state- But note next example busy-waits, which is less good11/41Lock-free producer/consumervoid producer (void *ignored) {for (;;) {/* produce an item and put in nextProduced */while (((in + 1) % BUF_SIZE) == out); // do nothingbuffer [in] = nextProduced;in = (in + 1) % BUF_SIZE;}}void consumer (void *ignored) {for (;;) {while (in == out); // do nothingnextConsumed = buffer[out];out = (out + 1) % BUF_SIZE;/* consume the item in nextConsumed */}}12/41Non-blocking synchronization• Design algorithm to avoid critical sections- Any threads can make progress if other threads are preempted- Which wouldn’t be the case if preempted thread held a lock• Requires atomic instructions available on so m
View Full Document