DOC PREVIEW
Stanford CS 140 - Notes

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

AdministriviaReaders-Writers ProblemImplementing shared locksshared locks (continued)Review: Test-and-set spinlockRelaxed consistency modelCache coherencecc-NUMANUMA and spinlocksRecall producer/consumer (lecture 3)Eliminating locksLock-free producer/consumerNon-blocking synchronizationExample: stackBenign racesRead-copy update href {http://www.rdrop.com/users/paulmck/RCU/rclockjrnl_tpds_mathtype.pdf}{[McKenney]}Garbage collectionMCS lockMCS AcquireMCS AcquireMCS AcquireMCS AcquireMCS Release w. CASMCS Release w. CASMCS Release w. CASMCS Release w/o CASMCS Release w/o C&S codeKernel support for synchronizationRace conditionThe deadlock problemMore deadlocksDeadlocks w/o computersDeadlock conditionsPrevent by eliminating one conditionResource-allocation graphExample resource allocation graphGraph with deadlockIs this deadlock?Cycles and deadlockPreventionClaim edgesExample: unsafe stateDetecting deadlockFixing & debugging deadlocksTransactionsDetecting data racesAdministrivia• Project 1 due right now- Free extension if you are here- Of for SCPD students who watch lecture before midnight• To get extension- Put “” at top of your design document•Project 2 section tomorrow 3:15pm here• I will be out of town Monday- Will hold office hours Wednesday, 2:30pm instead1/41Readers-Writers Problem• Multiple threads may access data- Readers – will only observe, not modify data- Writers – will change the data• Goal: allow multiple readers or one single writer- Thus, lock can be shared amongst concurrent readers• Can implement with other primitives- Keep integer i – # or readers or -1 if held by writer- Protect i with mutex- Sleep on condition variable when can’t get lock2/41Implementing shared locksstruct sharedlk {int i;mutex_t m;cond_t c;};void AcquireExclusive (sharedlk *sl) {lock (sl->m);while (sl->i) { wait (sl->m, sl->c); }sl->i = -1;unlock (sl->m);}void AcquireShared (sharedlk *sl) {lock (sl->m);while (sl->i < 0) { wait (sl->m, sl->c); }sl->i++;unlock (sl->m);}3/41shared locks (continued)void ReleaseShared (sharedlk *sl) {lock (sl->m);if (!--sl->i) signal (sl->c);unlock (sl->m);}void ReleaseExclusive (sharedlk *sl) {lock (sl->m);sl->i = 0;broadcast (sl->c);unlock (sl->m);}• Note: Must deal with starvation4/41Review: Test-and-set spinlockstruct var {int lock;int val;};void atomic_inc (var *v) {while (test_and_set (&v->lock));v->val++;v->lock = 0;}void atomic_dec (var *v) {while (test_and_set (&v->lock));v->val--;v->lock = 0;}5/41Relaxed consistency model• Suppose no sequential consistency• Recall alpha mb (mem. barrier)—what if we omit it?- Hardware could violate program orderPROGRAM ORDER VIEW ON OTHER CPUread/write: v->lock = 1; v->lock = 1;read: v->val;write: v->val = read val + 1;write: v->lock = 0; v->lock = 0;/* danger */v->val = read_val + 1;• If atomic dec called where danger, bad val results• mb in testand set preserves program order- All ops before mb in program order appear before on all CPUs- All ops after mb in program order appear after on all CPUs• Many example in this lecture assume S.C.- Need to add barrier instructions on non-S.C. hardware6/41Cache coherence• Performance requires caches• Sequential consistency requires cache coherence• Barrier & atomic ops require cache coherence• Bus-based approaches- “Snoopy” protocols, each CPU listens to memory bus- Use write through and invalidate when you see a write- Or have ownership scheme (e.g., Pentium MESI bits)- Bus-based schemes limit scalability• Cache-Only Memory Architecture (COMA)- Each CPU has local RAM, treated as cache- Cache lines migrate around based on access- Data lives only in cache7/41cc-NUMA• Previous slide had dance hall architectures- Any CPU can “dance with” any memory equally• An alternative: Non-Uniform Memory Access- Each CPU has fast access to some “close” memory- Slower to access memory that is farther away- Use a directory to k eep track of who is caching what• Originally for machines with many CPUs- Now AMD Opterons are kind of like this- Next generation Intel CPUs will be like this, too• cc-NUMA = cache-coherent NUMA- Can also have non-cache-coherent NUMA, though uncommon- BBN Butterfly 1 has no cache at all- Cray T3D has local/global memory8/41NUMA and spinlocks• Test-and-set spinlock has several advantages- Simple to implement and understand- One memory location for arbitrarily many CPUs• But also has disadvantages- Lots of traffic over memory bus (especially when > 1 spinner)- Not necessarily fair (same CPU acquires lock many times)- Even less fair on a NUMA machine- Allegedly Google had fairness problems even on Opterons• Idea 1: Avoid spinlocks altogether• Idea 2: Reduce bus traffic with better spinlocks- Design lock that spins only on local memory- Also gives better fairness9/41Recall producer/consumer (lecture 3)/* PRODUCER */for (;;) {/* produce item, putin nextProduced */mutex_lock (&mutex);while (count == BUF_SIZE)cond_wait (&nonfull,&mutex);buffer [in] = nextProduced;in = (in + 1) % BUF_SIZE;count++;cond_signal (&nonempty);mutex_unlock (&mutex);}/* CONSUMER */for (;;) {mutex_lock (&mutex);while (count == 0)cond_wait (&nonempty,&mutex);nextConsumed = buffer[out];out = (out + 1) % BUF_SIZE;count--;cond_signal (&nonfull);mutex_unlock (&mutex);/* use item innextConsumed */}10/41Eliminating locks• One use of locks is to coordinate multiple updates ofsingle piece of state• How to remove locks here?- Factor state so each variable only has a single writer• Producer/consumer example revisited- Assume for example you have sequential consistency- Assume one producer, one consumer-Why do we need count variable, written by both?To detect buffer full/empty- Have producer write in, consumer write out- Use in/out to detect buffer state- But note next example busy-waits, which is less good11/41Lock-free producer/consumervoid producer (void *ignored) {for (;;) {/* produce an item and put in nextProduced */while (((in + 1) % BUF_SIZE) == out); // do nothingbuffer [in] = nextProduced;in = (in + 1) % BUF_SIZE;}}void consumer (void *ignored) {for (;;) {while (in == out); // do nothingnextConsumed = buffer[out];out = (out + 1) % BUF_SIZE;/* consume the item in nextConsumed */}}12/41Non-blocking synchronization• Design algorithm to avoid critical sections- Any threads can make progress if other threads are preempted- Which wouldn’t be the case if preempted thread held a lock• Requires atomic instructions available on so m


View Full Document

Stanford CS 140 - Notes

Documents in this Course
Homework

Homework

25 pages

Load more
Download Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?